Technology Review Agrees

Multicore Processors create software headaches.

I guess my recent post was timely :)

The writer notes

A promising potential solution … the messy details could be left to compilers

When scrambling around in the dark, not stepping on broken glass might be a promising potential. But it’s not really a solution.

And it’s going to take some really major work, such as the Bloom Language, to change things up. Languages like Fortran, C/C++, Java etc are going to have a very, very hard time truly reaping the benefits of increasing parallelization just through increased optimization.

The problem is, these languages are already full of implementation-specific tricks for gleaning extra bits of performance under different environments.

I can speak best to the case of C/C++. First the compilers will have to change, possibly attempting to bring on-board some modifications to the language. The C++0x standard was started in the early 2000s. The 0x refers to their assumption it would be finished some time before 2010. At the present time they are expecting to vote on the final draft in 2011 … and possible first release by the beginning of 2012… Fat chance of any Parallelization changes making it into C++ before 2020 :(

Various C compiler vendors have already recognized this as a lost cause and go ahead with their own parallelization features. ¬†OpenMP is sorta-kindof a parallel extension to the Language, but it appears that the promised OpenMP 3.0 support didn’t make it into the Visual Studio 2010 release.

Anyway, once the compilers have their parallelization features, people need to learn how to take advantage of them.

There were a lot of words between the original quote and that statement, so lets take a moment to recap.


A promising potential solution … the messy details could be left to compilers


Anyway, once the compilers have their parallelization features, people need to learn how to take advantage of them.

That’s right. Compiler optimizations are almost never free. It’s usually not so much as having to learn how to use them, but learning how to not-use them, which amounts to the same thing in the long term.

For example, with an earlier version of the Intel compiler, the two following pieces of code, which are theoretically identical, behaved radically different:

for ( int i = 0 ; i < array.size() ; ++i )

/* vs */

const int size = (int) array.size() ;
for ( int i = 0 ; i < size ; i++ )

Why? Because of a conflict between magical, behind the scenes optimizations. The compiler wasn’t absolutely sure that if it made the loop parallel that it could rely on the function “size()” to stay constant (it could get smaller than the current lots of work being executed).

So you have to tell the compiler that size isn’t going to change via the second variation.

And this is inherently going to hinder language-based changes.

Most importantly, it doesn’t change the fundamental fact that the compiler is going to have to inject CPU instructions into the end code to do this work, which means that as new CPUs come along, the compiler is going to wind up doing the wrong thing.

What we need, the only real, practical solution, is for hardware support for software parallelization. We need CPU instructions and CPU core(s) to do the work.

Task scheduling, for instance: every so often, the operating system has to intervene in the running of a thread/process. Assuming two threads are going to run on the same CPU/core, that means, the CPU that was executing your MP3 player has to stop executing your MP3-playing code for a moment. The Operating System then has to save all the state information about the thread, swap in the state of the next thread, schedule an interrupt for that, and then set that running.

It may be a very minor overhead, but it’s still less CPU cycles spent by a processor executing the actual workload. On a multi-core CPU, the chances are that under lightly loaded conditions, part of that work will actually be done on another CPU core. But wait a second. Why not devote a trivial “core” to scheduling? That way the scheduler could actually be guaranteed to be able to, first, determine which thread needs to be executed next and have everything ready to perform the swap before taking away precious CPU cycles from the currently running thread. And in the case of a singularly tasked workload (i.e. one process/thread running), it might not even have to stop it in the first place.

Given that most modern operating systems have lots of tasks/threads running and in waiting states, there’s usually quite a lot of overhead for the scheduler to keep track of, and it can’t, easily, tell that there isn’t going to be demand for a task swap without periodically interrupting the active thread. BOOM: Performance decrease.

With a scheduler core to assist the OS, single-task-per-core performance would finally match that of a dedicated single-tasking OS…

And if the CPU can provide instructions for parallelization management, boosh the code doesn’t have to worry so much about the strategies it uses. Perhaps the instructions could behave like a sort of Parallelization-BIOS that deals with many of the tasks but also provides the application with tuned data on runtime optimization. Boosh: New CPU? Use the same instructions and let the hardware worry about the minutae.

Infact, if you truly understand parallelism, having the software – even the machine code produced by the compiler – try to manage the parallelism is mind bogglingly stupid. It borders on being the computing equivalent of asking yourself the question “am I real?”

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: