I'm also going to be a little contrarian and say the future of the high-performance core is questionable. The basic problem is what kind of workloads are predominantly out there:
- Single-threaded
- Inefficient multithreading that shits itself past maybe 4-8 threads
- Efficient mulithreaded with little floating point (database, serving web sites, virtualized desktops, etc)
- Efficient multithreaded with a lot of floating point (physics & AI)
Core speeds aren't getting a lot faster and aren't going to. We're deep into diminishing returns with branch prediction, speculative execution, and other tricks, and it's taken 20 years to go from 1 GHz to
sometimes hitting 5 GHz for a few seconds. So if you're writing single-threaded code...sorry, you need to learn how to write multithreaded code.
For multithreaded workloads with little floating point, you don't really need a high-performance core. You need, depending on the specific workload, some combination of a lot of cache, a lot of memory bandwidth, and a lot of hardware threads.
If you have a lot of floating point, a GPU is not just a lot faster than a CPU, but it's also a lot easier to program. I can write a pretty simple loop over an array that does nothing but floating-point that looks like it ought to vectorize, but the compiler will end up barely touching SIMD instructions for some subtle, annoying reason, achieving 0% acceleration. By contrast, if I throw it in a GPU queue, it's guaranteed to use as many GPU cores as it can. It might not be perfectly optimized, but it's actually using the resource, so it's already beating the scalar code.
Most multithreaded code is inefficient. The result is that consumer CPUs mostly don't benefit from more than 8 cores. 16 cores is pretty groovy, but unless you're doing a professional workload, mostly those extra cores just chew on tasks Windows is generating. Your average 64-core high-end CPU would probably be more useful an an 8-core CPU with the rest of the die space spent on compute-only GPU cores.