Computers are based on sequences of 1s and 0s; bits. By chaining these together, you can form a vocabulary of instructions from a sort of tree. E.g. the first bit is either ‘close’ (0) or ‘open’ (1), and the second bit is either ‘gate’ (0) or ‘door’ (1). So, 00 is ‘close gate’, 10 is ‘open gate’ and 11 is ‘open door’.
CPUs used fixed-sized sequences of bits to represent their internal vocabulary like this, the result is called the instruction set. These machine instructions are usually incredibly simplistic, such as “add” or “divide”.
Typical computer programs are long sequences of these machine instructions which use math and algebra to achieve more complex goals. This is called “machine code”.
Very few programmers still work in machine code; we tend to work in more elaborate languages which allow us to express many machine code instructions with a single line of text, and in a slightly less mind-bending way. This is “program code”.
Think of it this way: bits are letters of the alpha bet; machine code instructions are the vocabulary or words of the language. Computer languages are the grammar, dialect and syntax one uses in order to communicate an idea via the computer.
At first, CPUs got faster, so the time each word took to process got shorter, making programs run faster.
Then that stopped and the manufacturers started slapping in more CPU cores.
But more CPUs does not equal more speed. Infact, it suffers from chefs-in-the-kitchen syndrome…
When you start a program on a modern, multi-tasking operating system, it seems like that program is running quite independently of any outside interference.
In actuality, your computer is only really running one single program – the operating system. It loads in your program’s code into its memory. Then it points the CPU at the start of the program and lets the CPU run those instructions for a little bit. After a given brief period of time (nano/micro seconds), the CPU jumps back into the operating system code.
This allows the operating system to ensure that all the programs it has started, and things like disk activity etc, get a fair shot at the CPU.
All of this operating system overhead is done in terms of program code.
If programs could just offload themselves – or chunks of their work – to another CPU/core, that would mess with the operating system’s scheduling. Only the operating system can assign work to specific CPUs/cores; if it didn’t, the offloaded code would effectively be going rogue due to the way current architecture works.
You really don’t want a system where a piece of code can usurp the operating system and take control of the CPU for itself.
To offload work, a program must summon the operating system, and create a thread which will be transfered to and executed on a different CPU core (hopefully, if none are available, the CPU will just add it to the scheduling list).
The work involved in this is significant. It’s not a few handy machine instructions. It’s more like several pages of long paragraphs.
That sets an entry barrier to the types of work that are worth doing in parallel. Anything under a few thousand CPU instructions isn’t really worth it, which results in the vast majority of code going unparallelized: your CPU cores sit there idle.
Infact, if you watch your CPU cores on most modern operating systems, even when the system is multitasking, almost all the work continues to happen on the primary core, because most of the tasks going on just aren’t heavy weight enough to cross that threshold.
It’s a bit like the difference between “Dear Santa, I’d like a bike this year, luv Oli” and having to file a requisition form for it. Asking for code to be executed independently of your main thread of work requires paragraphs, pages even, of machine instructions, no matter how convenient it is to express in your programming language.
The only way around this obstacle to better and more efficient usage of multiple CPU cores is for one or more of the manufacturers to introduce hardware (i.e. on-chip/machine instruction) support to the operating systems for these facilities.
Perhaps, for desktop systems etc, they might want to look at some kind of FPGA layer on their chips so that the scheduling systems of the OS/kernel can be converted into pseudo-machine code.