LMAO. Yup. Seeing 20+ percent gains in some games from literally swapping to another cpu in the same generation. Somehow not "much gain". You're the one who then decides later to say "Well anything over X fps doesn't matter".
When the way to realize a gain is to drive your resolution so low that your frame rate exceeds your monitor refresh rate, no, it's not much gain. I am not the one who decides this. I guess it was the Lord, since he didn't bless you with the ability to perceive frames your monitor didn't display.
That's because graphic work doesn't really touch the CPU other than assets going through the IMC to VRAM. Game logic and everything related to it is run on the CPU and that means everything as it relates to the game except drawing the picture on the screen.
I can't find the benchmark now but I saw the biggest gains for the X3D was in 640x480 and it seems to be that it cuts down on API/driver overhead in a curious way. Good game code is like feeding sausage in a wood chipper.
It's not because something changes about driver overhead. It's because the lower your resolution, more frames your GPU can draw in a second. Since the GPU and CPU run in parallel, the question is whether the CPU can get its work done before the GPU is ready to draw a new frame. If it can't, the GPU has to wait around for the CPU to get done. Just think about the simple example of a single core running at 3 GHz.
At 30 fps, it has 100 million clock cycles per frame
At 60 fps, it has 50m cycles
At 240 fps, it has 12.5m cycles
At 480 fps, that current-gen CPU core has about as many cycles to do its a the original Xbox's CPU did in a 60 FPS game. Of course, you have more cores, but you get the idea. When you have fewer clock cycles to throw around, you can't afford to waste them fetching data. That's why you see the 3D V-Cache having the most impact at low resolutions where the GPU can draw so fast that the CPU has to work at maximum efficiency to keep up. Lower resolution = less work per frame = higher frame rate = CPU eventually becomes the bottleneck.
FWIW, the context here is I do a lot of benchmarking on new-gen hardware, but it's all industrial workloads on high-end EPYC/Xeon/Ampere/Instinct stuff, so I legit did not know what the profile of gaming workloads looks like and what the right-sized hardware for a given use case is.