… then it needs to be easier to drive them.
I suspect it’s the age old feud between hardware and software guys. Simply put, efficiently managing workloads across multiple CPU cores requires work. Back when CPUs were single cored with multiple pipelines, the fancy details of channeling your CPU instruction workload to the appropriate bit of hardware was done for you.
Then we got multiple cores, and it became the programs’ responsibility for dispatching work across multiple cores.
More cores were added, meaning more CPU instructions to delegate work.
We have fancy instruction sets for multi-media operations and floating point calculations. But where is the hardware assistance for parallelization? Where is the “thisjmp” CPU instruction that lets you tell the CPU you will be performing an branch to object-oriented code at address X but associated with the object instance data at address Y so it can prefetch both code and data from a single instruction?
When will we get a “task” instruction that follows a branch into non-returning code on a different core if one is available, or execute it on the current core if not.
Many of these changes would allow compilers to start taking advantage of CPU advances/core counts without requiring any actual software changes. They would also lower the entry-level barrier to parallelizing workloads and make most of the typical, day-to-day PC activities more suited to parallelization.
With current, by hand, parallelization, there is usually a documentation caveat that says “don’t do this unless each workload is going to consist of 1000 or more cpu instructions”. Which means that the vast majority of code winds up remaining serial.