GPUs & CPUs & Enthusiast hardware: Questions, Discussion and fanboy slap-fights - Nvidia & AMD & Intel - Separe but Equal. Intel rides in the back of the bus.

The Mass Shooter Ron Soye · Sep 15, 2024

Hyperbola said:
Anyone know anything about this? Are there any GPUs out there that are built for double precision operations? It seems the AI field has embraced even smaller precision floating point math (~3ish decimals? I'm amazed that even works.), which works for them, but isn't really useable for certain physics simulations. Painting pictures on a screen rarely needs anything larger than 32 bit floats.

All of the consumer grade GPUs have intentionally gimped FP64. Looks like AMD gimps the consumer cards half as much:

https://www.techpowerup.com/gpu-specs/geforce-rtx-3090.c3622

FP32 (float) 35.58 TFLOPS
FP64 (double) 556.0 GFLOPS (1:64)

https://www.techpowerup.com/gpu-specs/radeon-rx-7900-xtx.c3941

FP32 (float) 61.39 TFLOPS
FP64 (double) 1.918 TFLOPS (1:32)

https://www.techpowerup.com/gpu-specs/geforce-rtx-4090.c3889

FP32 (float) 82.58 TFLOPS
FP64 (double) 1,290 GFLOPS (1:64)

The good old Radeon VII is beating all of those, and even it was gimped from a 1:2 professional card:

https://www.techpowerup.com/gpu-specs/radeon-vii.c3358

FP32 (float) 13.44 TFLOPS
FP64 (double) 3.360 TFLOPS (1:4)

But consult someone who actually knows what's up.

Betonhaus · Sep 15, 2024

The Mass Shooter Ron Soye said:
All of the consumer grade GPUs have intentionally gimped FP64. Looks like AMD gimps the consumer cards half as much:

https://www.techpowerup.com/gpu-specs/geforce-rtx-3090.c3622
FP32 (float) 35.58 TFLOPS
FP64 (double) 556.0 GFLOPS (1:64)

https://www.techpowerup.com/gpu-specs/radeon-rx-7900-xtx.c3941
FP32 (float) 61.39 TFLOPS
FP64 (double) 1.918 TFLOPS (1:32)

https://www.techpowerup.com/gpu-specs/geforce-rtx-4090.c3889
FP32 (float) 82.58 TFLOPS
FP64 (double) 1,290 GFLOPS (1:64)

The good old Radeon VII is beating all of those:

https://www.techpowerup.com/gpu-specs/radeon-vii.c3358
FP32 (float) 13.44 TFLOPS
FP64 (double) 3.360 TFLOPS (1:4)

But consult someone who actually knows what's up.

even the intel Arc a310 beats the 3090 at FP64

https://www.techpowerup.com/gpu-specs/arc-a310.c3930

FP32 (float)2.688 TFLOPS
FP64 (double)672.0 GFLOPS (1:4)

also what kind of ai can I run on one that's on an old Debian server that just uses it for transcoding?

Susanna · Sep 15, 2024

Betonhaus said:
even the intel Arc a310 beats the 3090 at FP64

https://www.techpowerup.com/gpu-specs/arc-a310.c3930
FP32 (float)2.688 TFLOPS
FP64 (double)672.0 GFLOPS (1:4)

also what kind of ai can I run on one that's on an old Debian server that just uses it for transcoding?

An upscaler would be the natural thing to pair with your transcoder. I’ve run one on my workstation, since I don’t think there’s a jellyfin plugin for this yet, and admittedly watching old anime at 4k is really cool.

Fapcop · Sep 15, 2024

In Intel’s defense, it wasn’t ENTIRELY their fault that their gaming GPUs bombed.

As I recall, much of the ARC effort was based in… Moscow. And when the SMO started, they suddenly had a real clusterfuck on their hands, and spent much of that year relocating staff from Intel in Russia, and replacing those who didn’t want to leave.

That was just one factor in why their cards arrived late but I reckon it was a big one.

The Ugly One · Sep 15, 2024

Xentor said:
Will an AI-based upscaler ever work well on an AMD though? Sure, asking someone like CDPR to go and start optimizing the game engine, models etc so they'd play nice with FSR is a lot to ask - but if the market share was there, this could be a possibility.

Why wouldn't it? There's no secret to tensor algebra that only Nvidia knows.

Hyperbola said:
Anyone know anything about this?

Yes. Nvidia crippled FP64 on gaming GPUs after Titan Black ate into their business. You'll need at least a workstation GPU, or maybe look for a used Radeon VII or V100.

You shouldn't need doubles if you scale your problem appropriately. DM and maybe I can help.

Xentor · Sep 15, 2024

The Ugly One said:
Why wouldn't it? There's no secret to tensor algebra that only Nvidia knows.

Well it's mostly because of the Tensor-like cores, so the reasons are architectural. AMD doesn't have anything like it in the consumer grade, while Intel has the XMX.

The Ugly One · Sep 15, 2024

Xentor said:
Well it's mostly because of the Tensor-like cores, so the reasons are architectural. AMD doesn't have anything like it in the consumer grade, while Intel has the XMX.

RDNA3's SIMD units can do matrix operations.

Xentor · Sep 15, 2024

The Ugly One said:
RDNA3's SIMD units can do matrix operations.

Never said it can't work at all, but that it won't work well. AMD GPUs just are rasterization optimized, it is what it is.

The Ugly One · Sep 15, 2024

Xentor said:
Never said it can't work at all, but that it won't work well. AMD GPUs just are rasterization optimized, it is what it is.

A 3050 Ti Mobile can crank through frame-wide inferencing just fine with all of 5.3 FP16 TFLOP/s. This is 1/4 the power of, say a Radeon 6600 XT, which can do 21 FP16 TFLOP/s.

I'm not sure why you think AMD cards can't churn through tensor operations fast enough to inference on a frame buffer of a few million pixels. It's just not all that much arithmetic.

Xentor · Sep 16, 2024

The Ugly One said:
A 3050 Ti Mobile can crank through frame-wide inferencing just fine with all of 5.3 FP16 TFLOP/s. This is 1/4 the power of, say a Radeon 6600 XT, which can do 21 FP16 TFLOP/s.

Talking about apples and oranges...

The Ugly One said:
I'm not sure why you think AMD cards can't churn through tensor operations fast enough to inference on a frame buffer of a few million pixels. It's just not all that much arithmetic.

Mixed-precision performance is important, which is lackluster on AMD.

The Ugly One · Sep 16, 2024

Xentor said:
Talking about apples and oranges...

No, actually, they're both gaming GPUs that are capable of inferencing.

Xentor said:
Mixed-precision performance is important, which is lackluster on AMD.

The 6000 series is good enough to run XeSS with DP4a. And given that the latest 7000 series GPUs smoke 3050s in AI benchmarks, there's no reason they shouldn't be able to run something as lightweight as inferencing-based upscaling.

BIG BILL HELL'S · Sep 16, 2024

Exclusive: How Intel lost the Sony PlayStation business (A)

By Max A. Cherney
Reporting by Max A. Cherney in San Francisco; editing by Kenneth Li, Deepa Babington and Leslie Adler

This private information is unavailable to guests due to policies enforced by third-parties.

Squishie PP · Sep 16, 2024

Hyperbola said:
One thing that I'm running into writing physics simulations on the GPU is that my consumer grade GPUs (3090 RTX) kind of suck at double precision operations. (Double precision (~15 decimal places) operations are about 16 times slower than single precision (~7 decimal places) operations.) It's still faster than the 20-core CPU on a large enough grid (where the crossover appears to be ~10 million cells/elements/what-have-you), but much slower than I'd like.)

Anyway, I'm going to have to figure out how to tune this thing to use single precision, but I'm not sure how yet. The increments are close to the precision limit, especially since my timesteps need to get smaller as my grid gets finer.

Anyone know anything about this? Are there any GPUs out there that are built for double precision operations? It seems the AI field has embraced even smaller precision floating point math (~3ish decimals? I'm amazed that even works.), which works for them, but isn't really useable for certain physics simulations. Painting pictures on a screen rarely needs anything larger than 32 bit floats.

(What would really be cool is a math coprocessor with native 128 bit float operations - then I could forget entirely that I'm not working with continuum numbers. Alas, not needed by the AI or graphics crowd. So if anyone builds one, I'm unlikely to be able to afford it.)

If you really wanted to you could write your own custom routines for software-level 128-bit floating operations. The same thing has been done before for 8-bit, 4-bit, 3-bit, etc. floating point operations in order to run AI models with quantized weights on hardware that lacks support for these data types.

Hyperbola · Sep 16, 2024

If it's true that they are "nerfing" FP64 that hard, then it should be possible to write a bunch of fp64 software emulation that runs faster using fp32 operations. I was thinking about that yesterday.

It may be less of a pain to figure out how to scale (or otherwise stabilize) my problem. It tends to go unstable when the timesteps are larger than a certain von-Neumannish stability limit, but the limit would be uncomfortably close to the precision limit with fp32.

Vecr · Sep 16, 2024

Hyperbola said:
If it's true that they are "nerfing" FP64 that hard, then it should be possible to write a bunch of fp64 software emulation that runs faster using fp32 operations. I was thinking about that yesterday.

It may be less of a pain to figure out how to scale (or otherwise stabilize) my problem. It tends to go unstable when the timesteps are larger than a certain von-Neumannish stability limit, but the limit would be uncomfortably close to the precision limit with fp32.

I know there are algorithms that let you use two doubles (2 f64s) as a single number, you could try doing the same thing with 2 f32s. But the tuning might be difficult, how much mantissa you want, etc.

The Mass Shooter Ron Soye · Sep 16, 2024

AGESA 1.2.0.2 might fix Zen 5's cross-CCX latency problem. Just a comment I saw.

BIG BILL HELL'S said:
Exclusive: How Intel lost the Sony PlayStation business (A)

You need to put articles outside of A&N in a [ PRIVATE ] tag.

The Ugly One · Sep 16, 2024

Hyperbola said:
If it's true that they are "nerfing" FP64 that hard, then it should be possible to write a bunch of fp64 software emulation that runs faster using fp32 operations. I was thinking about that yesterday.

That's basically what the gaming cards are already doing, and why it's so slow.

DeadwastePrime · Sep 16, 2024

The Mass Shooter Ron Soye said:
You need to put articles outside of A&N in a [ PRIVATE ] tag.

too fucking bad for them then

Betonhaus · Sep 16, 2024

BIG BILL HELL'S said:
Intel and AMD were the final two contenders in the bidding process for the contract.

Who were all the other contenders? I can see Nvidia being one as they make the Switch chips, but where did Broadcom come from? I wasn't aware that Broadcom had meaningful GPU experience.

The Mass Shooter Ron Soye · Sep 16, 2024

Betonhaus said:
Who were all the other contenders? I can see Nvidia being one as they make the Switch chips, but where did Broadcom come from? I wasn't aware that Broadcom had meaningful GPU experience.

There were reports that Sony was considering switching to Arm. Even AMD is possibly making an Arm-based APU, "Sound Wave". Haven't heard of Broadcom making any big moves, but I'm interested to see if a MediaTek + Nvidia combo slap the shit out of Qualcomm's Snapdragon X Elite within a year or two.

No One Is Buying AMD Zen 5 CPUs, So What's Going On?

AMD AGESA 1.2.0.2 BIOS Improves Inter-Core Latency For Zen 5 “Ryzen 9000” CPUs, 58% Reduction & Major Performance Uplifts

With the new BIOS, the average latency drops down by 58% to 75ns when communicating across CCDs and the inter-CCD latency remains the same at 18-20ns.

Zen 5, which had an undeniable mess of a launch, is healing with this BIOS update.

GPUs & CPUs & Enthusiast hardware: Questions, Discussion and fanboy slap-fights - Nvidia & AMD & Intel - Separe but Equal. Intel rides in the back of the bus.

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

Betonhaus

Irrefutable Rationality

Susanna

Ruin is inevitable, and all else is prelude

Fapcop

The Ugly One

Piss Towel Therapist

Xentor

Gas the bikes, racecar now!

The Ugly One

Piss Towel Therapist

Xentor

Gas the bikes, racecar now!

The Ugly One

Piss Towel Therapist

Xentor

Gas the bikes, racecar now!

The Ugly One

Piss Towel Therapist

BIG BILL HELL'S

ATTN: Justin Trudeau & Vlad Putin

Exclusive: How Intel lost the Sony PlayStation business (A)

Squishie PP

Hyperbola

Negative determinant

Vecr

DM if I don't respond.

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

The Ugly One

Piss Towel Therapist

DeadwastePrime

pronouns in bio

Betonhaus

Irrefutable Rationality

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

GPUs & CPUs & Enthusiast hardware: Questions, Discussion and fanboy slap-fights - Nvidia & AMD & Intel - Separe but Equal. Intel rides in the back of the bus.

You CAN'T NOT DO IT!

Irrefutable Rationality

Ruin is inevitable, and all else is prelude

Piss Towel Therapist

Gas the bikes, racecar now!

Piss Towel Therapist

Gas the bikes, racecar now!

Piss Towel Therapist

Gas the bikes, racecar now!

Piss Towel Therapist

ATTN: Justin Trudeau & Vlad Putin

Exclusive: How Intel lost the Sony PlayStation business (A)​

Negative determinant

DM if I don't respond.

You CAN'T NOT DO IT!

Piss Towel Therapist

pronouns in bio

Irrefutable Rationality

You CAN'T NOT DO IT!

Exclusive: How Intel lost the Sony PlayStation business (A)