GPUs & CPUs & Enthusiast hardware: Questions, Discussion and fanboy slap-fights - Nvidia & AMD & Intel - Separe but Equal. Intel rides in the back of the bus.

The Mass Shooter Ron Soye · Sep 12, 2024

AMD Ryzen 5 9600X & Ryzen 7 9700X Linux Performance With 105 Watt cTDP

https://www.phoronix.com/benchmark/result/amd-ryzen-9000-series-linux-performance-benchmarks/result-1.svgz

Phoronix finds +6.6% geometric mean improvement for the 9700X (putting it just ahead of the 14900K and 7900X), +2.5% for the 9600X, when using the new 105W TDP.

jeff7989 · Sep 12, 2024

Oracle will use small nuclear reactors to power its servers.

https://www.tomshardware.com/tech-industry/oracle-will-use-three-small-nuclear-reactors-to-power-new-1-gigawatt-ai-data-center

But muh Chernobyl. I can't wait for seething from reddit. We should have more modern plants being built instead of producing waste with solar and other short sighted crap.

Sep 12, 2024

jeff7989 said:
Oracle will use small nuclear reactors to power its servers.

https://www.tomshardware.com/tech-industry/oracle-will-use-three-small-nuclear-reactors-to-power-new-1-gigawatt-ai-data-center
But muh Chernobyl. I can't wait for seething from reddit. We should have more modern plants being built instead of producing waste with solar and other short sighted crap.

Now why, pray, would Jeffrey Epstein associate* (((Larry Ellison))) want the US government to directly supply him with poorly regulated weapons-grade plutonium?

NUMEC, anybody?
* The Jeffrey Epstein who was CFO of Oracle is a different guy, ironically, because Ellison is somewhat of his own operation in terms of compromising businessmen and companies foreign and domestic

WelperHelper99 · Sep 12, 2024

jeff7989 said:
Oracle will use small nuclear reactors to power its servers.

https://www.tomshardware.com/tech-industry/oracle-will-use-three-small-nuclear-reactors-to-power-new-1-gigawatt-ai-data-center
But muh Chernobyl. I can't wait for seething from reddit. We should have more modern plants being built instead of producing waste with solar and other short sighted crap.

I mean good. AI and computers in general are power hogs, especially with newer CPU's. Nuclear is the way to go.

Otis Mallebrok · Sep 12, 2024

The Ugly One said:
Arc and AI are almost completely unconnected. I have to say this over and over again, but gaming GPUs are just not relevant to the AI market, meaning the people who actually make money on AI.

I don't understand that? Aren't Nvidias training accelerators essentially beefed up GPUs?

The Mass Shooter Ron Soye · Sep 12, 2024

Intel’s “CHIPS Act” Escape Route Under Jeopardy As Biden Administration Poses Reservations
https://www.bloomberg.com/news/arti...throw-biden-team-s-chip-strategy-into-turmoil (archive)

Today's Dead Inside™ rumor comes from Bloomberg. (But it's from a week ago)

Samsung's yield issues reportedly delays Taylor fab launch to 2026

The company's agreement with the U.S. government to receive up to $6.4 billion in grants and loans under the U.S. CHIPS Act is now in jeopardy. The deal requires the plant to be operational, a condition unlikely to be met anytime soon, based on the report.

Samsung also having trouble. Maybe the next round of CHIPS $$$ will have no strings attached.

DavidS877 · Sep 12, 2024

jeff7989 said:
Oracle will use small nuclear reactors to power its servers.

https://www.tomshardware.com/tech-industry/oracle-will-use-three-small-nuclear-reactors-to-power-new-1-gigawatt-ai-data-center
But muh Chernobyl. I can't wait for seething from reddit. We should have more modern plants being built instead of producing waste with solar and other short sighted crap.

Hopefully they're also smart enough to put it somewhere where the climate is a help, rather than a hindrance for cooling. Presumably this is why some datacenters are locating in Washington/Oregon instead of Arizona.

The Mass Shooter Ron Soye · Sep 12, 2024

NVIDIA GeForce RTX 4070 GDDR6 vs. GDDR6X tested: 99% performance at 1440p/1080p, 98% at 4K

4.8% slower GDDR6 not a big deal for the 4070 GIMP Edition

The Ugly One · Sep 12, 2024

I'm also going to be a little contrarian and say the future of the high-performance core is questionable. The basic problem is what kind of workloads are predominantly out there:

Single-threaded
Inefficient multithreading that shits itself past maybe 4-8 threads
Efficient mulithreaded with little floating point (database, serving web sites, virtualized desktops, etc)
Efficient multithreaded with a lot of floating point (physics & AI)

Core speeds aren't getting a lot faster and aren't going to. We're deep into diminishing returns with branch prediction, speculative execution, and other tricks, and it's taken 20 years to go from 1 GHz to sometimes hitting 5 GHz for a few seconds. So if you're writing single-threaded code...sorry, you need to learn how to write multithreaded code.

For multithreaded workloads with little floating point, you don't really need a high-performance core. You need, depending on the specific workload, some combination of a lot of cache, a lot of memory bandwidth, and a lot of hardware threads.

If you have a lot of floating point, a GPU is not just a lot faster than a CPU, but it's also a lot easier to program. I can write a pretty simple loop over an array that does nothing but floating-point that looks like it ought to vectorize, but the compiler will end up barely touching SIMD instructions for some subtle, annoying reason, achieving 0% acceleration. By contrast, if I throw it in a GPU queue, it's guaranteed to use as many GPU cores as it can. It might not be perfectly optimized, but it's actually using the resource, so it's already beating the scalar code.

Most multithreaded code is inefficient. The result is that consumer CPUs mostly don't benefit from more than 8 cores. 16 cores is pretty groovy, but unless you're doing a professional workload, mostly those extra cores just chew on tasks Windows is generating. Your average 64-core high-end CPU would probably be more useful an an 8-core CPU with the rest of the die space spent on compute-only GPU cores.

The Mass Shooter Ron Soye · Sep 13, 2024

AMD plans for FSR4 to be fully AI-based — designed to improve quality and maximize power efficiency

AMD will finally use AI for FSR (probably upscaling and not frame generation). It remains to be seen which GPUs will support it, or if NPUs will be involved. My uninformed guess is that FSR4 will require an RDNA3 GPU or XDNA2 NPU (>40 TOPS), but it could support RDNA2 to target more users, support the weak Phoenix/Hawk NPU (10-16 TOPS), or not use NPUs at all.

It's plausible that it will support the specific elements in the competing GPUs (e.g. "AI Matrix Engines" in Intel Arc/Xe). Or it doesn't need those and uses a portion of GPU cores.

They don't mention when it will be ready, but if FSR4 isn't launched by the time RDNA4 launches, they will look stupid.

AMD Strix Halo APUs & GFX1151 iGPU Now Supported In ROCm, Offers Full AVX512 Width & Strong Performance

Strix Point mobile does not support full-width AVX-512, but the Ryzen 9000 desktop CPUs do. Looks like Strix Halo is more like desktop in that regard, presumably from sharing the same chiplets.

https://www.youtube.com/watch?v=-nhZJ1RTTsM

AMD Ryzen 5 7600X3D 3D V-Cache CPU Launches In China, Availability on 20th September

7600X3D arrives in China, lazy AMD puts a black sticker on the box to cover up Taiwan.

Intel Core Ultra 5 245K “Arrow Lake” 14 Core CPU Benchmarked, 6% Faster Than 14600K, Runs Cool & Consumes Up To 160W

Zen 5 gave Arrow Lake some breathing room. It looks like it will be a good generation.

Intel Approaches US Commerce Secretary Gina Raimondo To Help Sort Out Financial Mess

The Ugly One · Sep 14, 2024

The Mass Shooter Ron Soye said:
AMD will finally use AI for FSR (probably upscaling and not frame generation). It remains to be seen which GPUs will support it, or if NPUs will be involved. My uninformed guess is that FSR4 will require an RDNA3 GPU or XDNA2 NPU (>40 TOPS), but it could support RDNA2 to target more users, support the weak Phoenix/Hawk NPU (10-16 TOPS), or not use NPUs at all.

I doubt it will use an NPU, since that would apply only to laptops. However, my equally uneducated guess is that it will run on anything RDNA2 and above. Intel's XeSS inferences on older cards, and AMD's trying to push FSR as a "works anywhere" solution that neutralizes the DLSS advantage - FSR & frame interpolation can now be enabled for any DX11 or DX12 game running in full screen. The problem, of course, is FSR's upscaling sucks, although I've been playing with its frame generation and have found it to be quite nice. If FSR4's upscaling is at least as good as XeSS, and there's no technical reason for it not to be, they will have at least technically achieved what they set out to do with FSR.

The Mass Shooter Ron Soye · Sep 14, 2024

The Ugly One said:
I doubt it will use an NPU, since that would apply only to laptops. However, my equally uneducated guess is that it will run on anything RDNA2 and above. Intel's XeSS inferences on older cards, and AMD's trying to push FSR as a "works anywhere" solution that neutralizes the DLSS advantage - FSR & frame interpolation can now be enabled for any DX11 or DX12 game running in full screen. The problem, of course, is FSR's upscaling sucks, although I've been playing with its frame generation and have found it to be quite nice. If FSR4's upscaling is at least as good as XeSS, and there's no technical reason for it not to be, they will have at least technically achieved what they set out to do with FSR.

My contrapoints:
1. The interviewee is heavily focusing on mobile: "So now we're going AI-based frame generation, frame interpolation, and the idea is increased efficiency to maximize battery life. And then we could lock the frames per second, maybe it's 30 frames per second, or 35. My number one goal right now is to maximize battery life. I think that's the biggest complaint."

2. Older (sucky) versions of FSR technically fulfill the "works anywhere" goal in perpetuity. "Works anywhere" makes you look like the good guy, but they may be seeing the allure of a "buy our new stuff to use new features" strategy.

3. They can probably make it work on a GPU or NPU. They could set the minimum at RDNA2 (dGPU?) and XDNA2. If they want it to work on Z1 Extreme, that uses an RDNA2 iGPU, and I believe it disables the weak XDNA1 NPU that is on the die. There could be an efficiency benefit to using the NPU for upscaling, so maybe Z2 Extreme based on Strix Point will have it enabled for that purpose. Even the lower-cost Kraken Point die should have XDNA2, despite Lisa Su joking about how much die space it costs.

The Ugly One · Sep 14, 2024

The Mass Shooter Ron Soye said:
My contrapoints:
1. The interviewee is heavily focusing on mobile: "So now we're going AI-based frame generation, frame interpolation, and the idea is increased efficiency to maximize battery life. And then we could lock the frames per second, maybe it's 30 frames per second, or 35. My number one goal right now is to maximize battery life. I think that's the biggest complaint."

2. Older (sucky) versions of FSR technically fulfill the "works anywhere" goal in perpetuity. "Works anywhere" makes you look like the good guy, but they may be seeing the allure of a "buy our new stuff to use new features" strategy.

3. They can probably make it work on a GPU or NPU. They could set the minimum at RDNA2 (dGPU?) and XDNA2. If they want it to work on Z1 Extreme, that uses an RDNA2 iGPU, and I believe it disables the weak XDNA1 NPU that is on the die. There could be an efficiency benefit to using the NPU for upscaling, so maybe Z2 Extreme based on Strix Point will have it enabled for that purpose. Even the lower-cost Kraken Point die should have XDNA2, despite Lisa Su joking about how much die space it costs.

1. &3. Well, then maybe it will use those mobile NPUs. It's on the same chip/package, so why not? It comes down to what's the most energy-efficient in that case and could do some serious damage to NVIDIA's laptop dGPU business.

2. Yes, they work anywhere, but what they failed to do was achieve technical parity. I think the idea was to position it as, "DLSS is nothing special - FSR is just as good and works on any card." Then developers would only need to implement one solution, as developers generally don't like to have to implement the same thing multiple ways for differnet platforms. Removing NVIDIA's software advantage would then force them to compete as a commodity, purely on a bytes-per-dollar sort of basis. Except FSR isn't just as good; it's shit. It's even worse than XeSS on AMD's own cards, which is an embarrassment. And thus NVIDIA cards remain qualitatively better and can charge a premium, and new games continue to lead with DLSS.

I know I'm an outlier due to having both AMD & NVIDIA on my desk, but I'm so dissatisfied with the 6700XT I'm considering sidegrading to a 4060 or even 3060 Ti.

Xentor · Sep 14, 2024

The Ugly One said:
Except FSR isn't just as good; it's shit. It's even worse than XeSS on AMD's own cards, which is an embarrassment.

That's mostly due to half-assed implementations of FSR, to be fair. Just look at Cyberpunk 2077 right now - they don't even add the newest version and the new version they did add is waaay worse than the previous one.

The Ugly One · Sep 14, 2024

Xentor said:
That's mostly due to half-assed implementations of FSR, to be fair. Just look at Cyberpunk 2077 right now - they don't even add the newest version and the new version they did add is waaay worse than the previous one.

It's because FSR is purely heuristic-based. Heuristic methods are extremely prone to sparkling when trying to synthesize details.

If you're a math person at all, what they essentially are doing is trying to reconstruct high frequencies purely from low-frequency information. This sort of thing is extremely prone to aliasing, no matter what you do. Inferencing, by contrast, reconstructs high frequency information from by using the current image's low-frequency information to inform the synthesis of high-frequency information from other images, which does a much better job.

Xentor · Sep 15, 2024

The Ugly One said:
It's because FSR is purely heuristic-based. Heuristic methods are extremely prone to sparkling when trying to synthesize details.

Well sure, I'm not saying FSR isn't worse on a fundamental level - others being AI based and all, just that due to market shares FSR just often seems to be a box to tick and not much more. It's not like AMD is moving away from high-end gaming GPUs for no reason to build market shares and make optimizing for it's architecture more relevant.

The Ugly One · Sep 15, 2024

Xentor said:
Well sure, I'm not saying FSR isn't worse on a fundamental level - others being AI based and all, just that due to market shares FSR just often seems to be a box to tick and not much more. It's not like AMD is moving away from high-end gaming GPUs for no reason to build market shares and make optimizing for it's architecture more relevant.

And I'm saying the problems aren't due to individual game studios half-assing it. There's not really a lot to do to implement it, either. It 's really just a few lines of code. It just kind of sucks due to fundamental design flaws.

Xentor · Sep 15, 2024

The Ugly One said:
And I'm saying the problems aren't due to individual game studios half-assing it. There's not really a lot to do to implement it, either. It 's really just a few lines of code. It just kind of sucks due to fundamental design flaws.

Will an AI-based upscaler ever work well on an AMD though? Sure, asking someone like CDPR to go and start optimizing the game engine, models etc so they'd play nice with FSR is a lot to ask - but if the market share was there, this could be a possibility.

Hyperbola · Sep 15, 2024

One thing that I'm running into writing physics simulations on the GPU is that my consumer grade GPUs (3090 RTX) kind of suck at double precision operations. (Double precision (~15 decimal places) operations are about 16 times slower than single precision (~7 decimal places) operations.) It's still faster than the 20-core CPU on a large enough grid (where the crossover appears to be ~10 million cells/elements/what-have-you), but much slower than I'd like.)

Anyway, I'm going to have to figure out how to tune this thing to use single precision, but I'm not sure how yet. The increments are close to the precision limit, especially since my timesteps need to get smaller as my grid gets finer.

Anyone know anything about this? Are there any GPUs out there that are built for double precision operations? It seems the AI field has embraced even smaller precision floating point math (~3ish decimals? I'm amazed that even works.), which works for them, but isn't really useable for certain physics simulations. Painting pictures on a screen rarely needs anything larger than 32 bit floats.

(What would really be cool is a math coprocessor with native 128 bit float operations - then I could forget entirely that I'm not working with continuum numbers. Alas, not needed by the AI or graphics crowd. So if anyone builds one, I'm unlikely to be able to afford it.)

Betonhaus · Sep 15, 2024

Hyperbola said:
One thing that I'm running into writing physics simulations on the GPU is that my consumer grade GPUs (3090 RTX) kind of suck at double precision operations. (Double precision (~15 decimal places) operations are about 16 times slower than single precision (~7 decimal places) operations.) It's still faster than the 20-core CPU on a large enough grid (where the crossover appears to be ~10 million cells/elements/what-have-you), but much slower than I'd like.)

Anyway, I'm going to have to figure out how to tune this thing to use single precision, but I'm not sure how yet. The increments are close to the precision limit, especially since my timesteps need to get smaller as my grid gets finer.

Anyone know anything about this? Are there any GPUs out there that are built for double precision operations? It seems the AI field has embraced even smaller precision floating point math (~3ish decimals? I'm amazed that even works.), which works for them, but isn't really useable for certain physics simulations. Painting pictures on a screen rarely needs anything larger than 32 bit floats.

(What would really be cool is a math coprocessor with native 128 bit float operations - then I could forget entirely that I'm not working with continuum numbers. Alas, not needed by the AI or graphics crowd. So if anyone builds one, I'm unlikely to be able to afford it.)

would an enterprise gpu like the Nvidia Tegra line work?

Sadly it seems the Xeon Phi is discontinued, which would be the closest thing to what you were looking for

GPUs & CPUs & Enthusiast hardware: Questions, Discussion and fanboy slap-fights - Nvidia & AMD & Intel - Separe but Equal. Intel rides in the back of the bus.

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

jeff7989

⠠⠠⠅⠑⠋⠋⠁⠇⠎ ⠠⠠⠊⠎ ⠠⠠⠁ ⠠⠠⠋⠁⠛

WHO DARES BATTLE THE SARACEN

WelperHelper99

Unlimited Sneed Works

Otis Mallebrok

We fought three kings in the isle of Lindis

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

DavidS877

Giant Meteor Goes to Washington

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

The Ugly One

Piss Towel Therapist

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

The Ugly One

Piss Towel Therapist

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

The Ugly One

Piss Towel Therapist

Xentor

Gas the bikes, racecar now!

The Ugly One

Piss Towel Therapist

Xentor

Gas the bikes, racecar now!

The Ugly One

Piss Towel Therapist

Xentor

Gas the bikes, racecar now!

Hyperbola

Negative determinant

Betonhaus

Irrefutable Rationality