Comments on relativistic observer: Keep Adding Cores?

Pragmatically, concurrency allows you to overlap d...

2012-10-24T17:17:33.714-07:00

Pragmatically, concurrency allows you to overlap diverse processes. This is useful in I/O for instance, or in a truly coroutined process typical of a pipeline. Parallelism is the static duplication of code and simultaneous execution on multiple logical compute units. In modern GPUs, this is very clearly parallelism since each shader in the workgroup is running the same code in lockstep.

Sadly, data can vary, and so GPU shader kernels with decisions cause less-than-optimal performance because of the lockstep requirements. In a multicore CPU, this is not the case, and best average running times can be realized. A branch taken on one does not mean the same branch will be taken on another.

SIMD vector units share this specific downside caused by lockstep parallelism.

So in some ways, adding more cores is a better parallelism than the GPU and SIMD (vector) model.

The "recompiling" of a standard scalar algorithm for parallelism is the main problem we are looking to solve today in order to extend the Moore's law-like performance of our existing software. Essentially, we're looking for a "magic bullet".

This magic bullet is a hard problem, made harder by the lack of referential transparency, as you say. This has been known since the earliest days of optimization: if you can't tell which variables are const within a given scope, then it's nearly impossible to optimize.

So the inevitable procession of programming languages to higher and higher levels is fraught with problems for optimization and parallelism determination.

It has led me to use leveled language approach to development. The outer sections that aren't time-dependent are written in a higher-level language, like Objective-C++. The lower level time-dependent sections of the code are written in C or OpenCL.

In other words, I pay attention to what needs to be fast. To have a program recognize what needs to be fast is currently beyond our understanding. At a minimum, there should be pragmas for that!

Parallelism is the isolation of independent calcul...

2012-10-07T01:11:59.648-07:00

Parallelism is the isolation of independent calculations from dependent ones.

Thus referential transparency is critical.

Concurrency is used to run independent code in parallel, but it is also used to for multiple threads that run on one processor round-robin via interrupts.

Correction: the last link I provided in prior post was not about coaxing parallelism, rather about accomplishing the analogous reuse of orthogonal functions in an eager (strict) programming language, as can be accomplished in a lazy (total) language.

More on that:

http://copute.com/index.html.orig
Skeptical -> Purity -> Eager vs. Lazy -> Tradeoffs -> Performance

Your comment about parallelism vs. concurrence is ...

2012-10-04T23:02:07.146-07:00

Your comment about parallelism vs. concurrence is exactly my point, really. Although it does at first seem to bolster the objective of adding more cores. This is because multi-cores actually does properly support concurrency of diverse operations. The real point is that each individual operation needs to be handled properly, and using a general-purpose core for that operation will not really make sense from the performance or power footprint. Even a specialized massively parallel processor, like a GPU can't really handle operations like sensor processing, RAW demosaicing, and the entire pipeline required for imaging. It's just not fast enough for the demands of video and also for the increasing size of the sensors.

This is the era of specialized processors: scalers, baseband processors, imaging and signal processors, encryption hardware, etc. These processors can be practically optimal, can be designed for a specific useful throughput, can be specifically tailored to a power consumption footprint, and can also take up less die space.

The world of an SoC often contains several such processors embedded on its die.

The GPU has proven itself to be totally up to the task of a smooth animating user interface, though. And also, as it was designed, for OpenGL-based gaming, rendering textured polygons to a z-buffer, three-dimensional transforms, implementing shaders and geometry shaders as well.

The multiple cores and the GPU become the tableau for the application programmer, so an application can make use of the device and do wonderful things.

Back to parallelism and concurrency: the concurrency happens when a diverse set of operations such as video processing and dynamic control of sensors and MEMs needs to be done. Parallelism happens when a specialized computation needs to be performed that is homogenous.

Parallelism is really the domain of Grand Central Dispatch, the multithreading helper. A block of code may be distributed onto multiple micro threads, for instance, on the Intel platform.

But I distinguish between this kind of threaded parallelism and lockstep parallelism, like that which occurs in a SIMD instruction. Perhaps 32 identical operations happen at once, in a vector instruction set, such as those on Core i7 processors.

And on a GPU, the parallelism is best done, increasingly on a scalar shader. It is spread about in workgroups onto multiple shaders to operate in parallel on the GPU. Some GPUs have 500+ shaders operating in parallel. When the operation fits this model, then it can definitely be sped up. Indeed, I have experience in this, as well as in the vector instruction model, which is quite different to program, I assure you.

You can coax typical code into parallelism using something like a select instruction that is programmed to do a vector comparison operator and use the comparison mask to select the proper results of a vector. While this is eminently programmable and similar to microcode in some ways, it also features some inefficiency since you end up having to compute all the answer and choose between them at the end.

--Mark

Specialized processors are more efficient in terms...

2012-10-04T22:33:42.823-07:00

Specialized processors are more efficient in terms of silicon expended for the task they do, so the equation is they are desired when the task they do is frequent enough that the silicon efficiency gained is positive overall, i.e. the duty cycle of the specialized task must be factored in (as well as other considerations such as I/O load on the general CPU, etc).

That silicon (and energy) efficiency equation can be applied vice versa, in that making the general CPU simpler (e.g. RISC) can lead to greater silicon efficiency, because the complex instruction sets have a lower duty cycle.

I expect that for some years or decades the number of cores will continue to increase in line with Moore's law. Some ideas I have seen using a material other than silicon, 3D circuits (Intel’s tri-gate technology), and making cores simpler.

Ultimately the amount of processing we can fit in a very small gadget will reach a limit:

http://www.dailygalaxy.com/my_weblog/2012/05/is-the-age-of-silicon-coming-to-an-end-physicist-michio-kaku-says-yes.html
http://www.lifeslittlemysteries.com/2878-future-computers.html

However, there is solution to this. With near-field radio (e.g. Bluetooth, etc), we can put more cores some where on our body or clothing and offload processing from the gadget we hold in our hand. Hopefully we can charge it without wires too, so we don't have to think about it. Hopefully these spare processors become so cheap that they come standard in clothing and shoes, etc..

Identifying parallelism in software is not the same as concurrency:

http://existentialtype.wordpress.com/2011/03/17/parallelism-is-not-concurrency/
http://existentialtype.wordpress.com/2012/08/26/yet-another-reason-not-to-be-lazy-or-imperative/

And we can coax parallelism into a series of conditional operations on sets, by using a State monad and Traversable:

http://augustss.blogspot.com/2011/05/more-points-for-lazy-evaluation-in.html#c2904150906369733736

Painter does serve me well also. And I wonder if C...

2012-09-06T20:41:12.297-07:00

Painter does serve me well also. And I wonder if Corel is moving Painter to the tablet environment.

I also like drawing on my iPad, but I do miss the pressure sensitive aspects of the Wacom. The optimal thing would be a huge iPad-like tablet with a pressure sensitive pen. There are several technologies to bring pressure sensitivity to the iPad, even without Wacom's magical antenna-array tech. But let's face it, the Wacom is tried and true. It also supports tilt and bearing, which both open up even more creative possibilities.

The big Cintiq is the cleanest way I have seen to paint directly on any screen surface. But the clarity, contrast, and resolution of screens has gone way up lately. Those retina displays are gorgeous!

You are right: what we clearly need is a larger tablet with pressure sensitivity for artists. A friend has also mentioned this to me.

One of the main issues with tablets (capacitive tablet touch screens) is that your hand causes spurious contact points, disturbing the drawing process. I have taken to putting down a handkerchief (to lay my hand on) while sketching. There is also a glove for such a purpose that is being sold.

--Mark

I think a lot of ARTISTS and some others would go ...

2012-09-06T17:08:35.670-07:00

I think a lot of ARTISTS and some others would go for a larger (17" diagonal screen)tablet with high resolution and a pressure sensitive pen. People seem to get emotional about not having this so I am thinking about tearing my Wacom Cintiq apart and adding laptop parts so at least I can have one. I will take it to the cofee shop so I can be like all the other people with their 17" laptops only I will be able to draw on mine. Hopefully, I will have Painter running on it as that is still the best drawing program in my opinion. See you there Ron