- Joined
- Oct 14, 2023
Nah, that would take half an hour, probably.That sounds like CPU inference speed.
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Nah, that would take half an hour, probably.That sounds like CPU inference speed.
\ComfyUI\comfy\model_sampling.py:92: UserWarning: The operator 'aten::frac.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a\_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
Yeah on second thought, yeah. Though even on a RX 570, I can get like 10s per iteration (basically 2.5 minutes for a low 15 step image) on a 1024x1024 SDXL image, so a higher-end AMD card with a lot more VRAM being that slow seems odd.Nah, that would take half an hour, probably.
This is still slower than what you both think I should be getting, though.![]()
Well it's definitely been taking longer than two minutes recently so I don't know what has changed. But I'm getting about 3s/it, correct. For such an expensive card, it would be nice to get more out of it. What's a 570? Seven years old? And it wasn't even the top-end at the time.Yeah on second thought, yeah. Though even on a RX 570, I can get like 10s per iteration (basically 2.5 minutes for a low 15 step image) on a 1024x1024 SDXL image, so a higher-end AMD card with a lot more VRAM being that slow seems odd. 64 seconds seems more correct, 3 seconds per iteration still isn't the best, but that seems way more believable than being slower than a 4GB card.
Yeah I dunno. I can't speak on how much faster ROCM will be for you. I just didn't think DirectML would be that far behind. In my own experience, ROCM on my RX 570 was only like, single digit percentage faster than DirectML. I just stopped using Linux for inference because I couldn't update my packages without also updating ROCM to a version that no longer supports my ancient card. (RIP) So just mess around from time to time on Windows generating 704x704 images in 30 seconds or so on SD 1.5-based models.Well it's definitely been taking longer than two minutes recently so I don't know what has changed. But I'm getting about 3s/it, correct. For such an expensive card, it would be nice to get more out of it. What's a 570? Seven years old? And it wasn't even the top-end at the time.
If I had known that I would develop an interest in AI then I would have bought an Nvidia card. TBH, it did cross my mind but I figured VRAM is really important for AI and this has 20GB for far less than a 4090 with similar game quality. And probably the AMD side will catch up. Ach, well. Runpod is dirt cheap for someone who just plays around with it casually until ROCm 6.x is released on Windows. When it is, that's going to be a significant leap forward I think. Not quite on parity with Nvidia but it will be a big leap forwards for AMD on Windows.
I should probably just free up some disk space and set up a dual boot.
Well you've got the next card up from me and another 4GB VRAM. But that seems way faster than mine. 832x1216 is about the same number of pixels as 1024x1024 in my tests (if aspect isn't a factor). And that's 50 steps! I was doing 20 and it came in at 64 seconds. No second step for refiner as I was just doing a quick performance test. If you happen to be on your system now any chance you could do a quick generation of a 20 steps simple prompt at 1024x1024? I wouldn't like to impose but as your system is so blisteringly fast I feel this will probably only take ten seconds!I run 75 seconds on a 7900XTX Linux ComfyUI SDXL with 50 steps at 832x1216 then 20 more at 1664x2432
Yeah I dunno. I can't speak on how much faster ROCM will be for you. I just didn't think DirectML would be that far behind. In my own experience, ROCM on my RX 570 was only like, single digit percentage faster than DirectML. I just stopped using Linux for inference because I couldn't update my packages without also updating ROCM to a version that no longer supports my ancient card. (RIP) So just mess around from time to time on Windows generating 704x704 images in 30 seconds or so on SD 1.5-based models.
7900 XTX ComfyUI Linux sd_xl_base_1.0 only, no refiner, no upscaleWell you've got the next card up from me and another 4GB VRAM. But that seems way faster than mine. 832x1216 is about the same number of pixels as 1024x1024 in my tests (if aspect isn't a factor). And that's 50 steps! I was doing 20 and it came in at 64 seconds. No second step for refiner as I was just doing a quick performance test. If you happen to be on your system now any chance you could do a quick generation of a 20 steps simple prompt at 1024x1024? I wouldn't like to impose but as your system is so blisteringly fast I feel this will probably only take ten seconds!Jokes aside, just if you happen to be able to - I'd be curious to get a like for like between my card on Windows and your next one up on Linux.
100% 20/20 [00:06<00:00, 3.28it/s]
Requested to load AutoencoderKL
Loading 1 new model
Prompt executed in 9.70 seconds
100% 20/20 [00:08<00:00, 2.47it/s]
Requested to load AutoencoderKL
Loading 1 new model
Prompt executed in 12.65 seconds
Thanks for that. Well I'm going to take it as a positive sign. Your GPU is same generations as mine and not wildly different in specs (+4GB VRAM, +few hundred MHz) so hopefully the wild difference in outcome is ROCm. I hope they don't take too long to update the Windows version though you've got me curious now and I'm tempted to go and set up that dual-boot system. Shame this can't be done with WSL2.7900 XTX ComfyUI Linux sd_xl_base_1.0 only, no refiner, no upscale
100% 20/20 [00:06<00:00, 3.28it/s] Requested to load AutoencoderKL Loading 1 new model Prompt executed in 9.70 seconds
As I had just copied the base model to the system it was probably still in RAM, so if it had to load it off disk it would have probably been nearly a second slower.
That's my Gaming system.
For reference, my desktop 4060 Ti 16GB , also Linux same parameters.
100% 20/20 [00:08<00:00, 2.47it/s] Requested to load AutoencoderKL Loading 1 new model Prompt executed in 12.65 seconds
Both(different bottle result obviously):
View attachment 5797638
SNAIL OF PEACE [PBUH]View attachment 5798952
The real reason for bollards in Britain.
The 'suffer horse', as seen on MATI.I tried to make an @Null with the prompt "Unshowered Romanian forums admin." and I have to admit, the results are pretty accurate. (both from the first generation)
View attachment 5797368
Why Mario, of all people?Sigh, Bing is on to me, "Mario wearing red pants. The red nose of a proboscis monkey is in his lap. No monkey is present only the nose." only gives erroneous results, with the proper ones being censored.
edit: I GOT ONE I GOT ONE! Eat shit censors, I strike this blow for freedom!
There's a billion of images of him so he always appears on-model and he's a fake person so Google doesn't censor him like it does, for example, Trump.Why Mario, of all people?
idm man thats kinda gay to ask it to make fake penisesSigh, Bing is on to me, "Mario wearing red pants. The red nose of a proboscis monkey is in his lap. No monkey is present only the nose." only gives erroneous results, with the proper ones being censored.![]()
Its not really special if you aren't thwarting a censor to do it.Here you go, sir. Your Marios.