I've been thinking about the whole transition to ARM thing. There is some evidence that instruction set architecture has little effect on performance - it's all about power budgets
https://www.extremetech.com/extreme...-or-mips-intrinsically-more-power-efficient/2
I.e. performance in billions of instructions per second is more or less proportional to power regardless of the architecture.
However, one thing ARM chipsets have traditionally been poor at is the memory subsystem - LPDDRn is optimized for power and not performance. Geekbench isn't all that sensitive to memory subsystem performance because, as Linus Torvalds pointed out
https://www.realworldtech.com/forum/?threadid=136526&curpostid=136666
https://archive.vn/wip/iCwuj
And quite frankly, it's not even just the crypto ones. Looking at the other GB3 "benchmarks", they are mainly small kernels: not really much different from dhrystone. I suspect most of them have a code footprint that basically fits in a L1I cache.
So it seems like if you want to design an ARM chip with comparable performance outside of synthetic benchmarks with small kernels you need to up the power budget and really work on the memory subsystem.
Now it so happens that there is a standard for a high bandwidth memory called Wide IO. As the name suggests it's basically a wider bus than you typically get with DDR
https://www.extremetech.com/computi...es-between-wide-io-hbm-and-hybrid-memory-cube
https://archive.vn/TuCoq
Wide I/O is designed specifically to stack on top of SoCs and use vertical interconnects to minimize electrical interference and die footprint. This optimizes the package’s size, but also imposes certain thermal limitations, since heat radiated from the SoC has to pass through the entire memory die. Operating frequencies are lower, but a large number of I/O pins increases bandwidth by using a memory bus that’s up to 1024 bits wide.
Wide I/O is the first version of the standard, but it’s Wide I/O 2 that’s expected to actually reach the mass market — though some have argued that true adoption won’t come until Wide I/O 3, which should finally open a gap between itself and LPDDR4. The standard was ratified by JEDEC, but it’s often associated with Samsung due to that company’s extensive work on bringing it to market. Timing is unclear, but no major devices are expected to ship with Wide I/O in the first half of 2015. We may see some limited pickup in the back half of the year, possibly from Samsung’s own foundries.
Here's a nice comparison of DDR, LPDDR, Wide IO, HMC, and HBM:
Wide IO is good for mobile because you can put the DRAM package on top of the SOC in a process called 'package-on-package' or Pop.
I reckon if Apple is serious they'll use some exotic DRAM interface in their high-end ARM parts. There is some evidence they're hiring people for just such a project
https://jobs.apple.com/en-us/details/200003767/dram-product-engineer
https://archive.vn/8nU7c
Knowledge with state of the art packaging technology (pop, Tsv, etc.) and their relationship to DRAM signal/power integrity
Apple already solders DRAM to the motherboard so the non-upgradeable nature of Wide IO or HBM shouldn't bother them. And if they make their own ARM SOC they can use any memory interface they want, provided they can get one DRAM manufacturer to support it. They could even roll their own completely proprietary interface with more performance than Wide IO or HBM and sell it at many times the price.
Also 'We've got the fastest main memory interface in the industry' is just the sort of thing Tim Cook would want in his keynote introducing the new ARM Macs.
All this seems pretty plausible to me - you can attack the traditional weakness of ARM systems and do it in a way that isn't something Intel can easily do because traditionally they've gone along with the industry standard (aka 'herd') on DRAM technology - they just make sure their CPUs have support for the latest industry standard. Or sometimes the 'latest but one' - DDR5 has been a standard for a while and production is ramping up but
very few desktop systems support it (
archive).
Mind you Intel could put some HBM on a multichip module and use it as a massive 4th level cache for off module DDR4/DDR5 and still have something which works in this paradigm. They actually do something similar this for some mobile parts which have eDRAM, though it's not clear how wide the interface to that is.
Another question about ARM Macs is whether they'll still have Thunderbolt ports. Thunderbolt is dependent on PCI Express and there's something to said for power-efficient systems not using PCI Express but rather some internal, and mostly on-chip, bus - this is how the Windows on ARM systems work. Still, you could probably bridge that to PCIe and then just turn the bridge off when the ports are not in use. Or maybe they'll just drop Thunderbolt and come up with a proprietary, performant and expensive replacement.