Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🔧 At about Midnight EST I am going to completely fuck up the site trying to fix something.
So I grumbled and moved a whole lot of partitions around and installed Ubuntu as a dual boot. Got to be honest, it was kind of nice to be back in Linux land and not have to fight the system to avoid seeing ads in the start menu. I'm half tempted to stick with it as my default. I used to have Gentoo as my default OS so Ubuntu is easy-mode.

Anway, to the point:

Yeah I dunno. I can't speak on how much faster ROCM will be for you. I just didn't think DirectML would be that far behind. In my own experience, ROCM on my RX 570 was only like, single digit percentage faster than DirectML. I just stopped using Linux for inference because I couldn't update my packages without also updating ROCM to a version that no longer supports my ancient card. (RIP) So just mess around from time to time on Windows generating 704x704 images in 30 seconds or so on SD 1.5-based models.
I just had to satisfy my curiosity and see what the difference really was. And there may be other factors but it looks like the difference is big. First off though, this was with ROCm 6.0.2 which I needed for Radeon 7900XT support. Or least ways I had to break with the default version of 5.4 in Ubuntu 22.04 because that didn't support it. So after some merry old fiddling around with libraries and finding that PIP got its knickers in a twist about installing things in the right order, I was up and running and boy was it fun - from 3s/it with DirectML (on Windows) up to 2it/s with ROCm on Linux. Sometimes less, around 1.2it/s but either way I was seeing consistent minimums of 3x the speed. I could get a 1024x1024 SDXL image in around 20seconds. I didn't do serious benchmarking but I was typically doing around 40 steps. So more steps and < 1/3rd the time as well.

I don't doubt your recollection. It may be that there are significant boosts in speed with more recent ROCm versions - it's progressing at pace. And there might be other factors. I still hope they release v6 for Windows soon if for no other reason that I'd like to see what the same software does on Windows and if there's a difference. But I'll probably keep my Linux install around for a while either way - Microsoft have made some big strides with WSL2 in terms of making Windows developer friendly but it still couldn't compete with how easy it felt getting all the right versions lined up on Linux.
 
It may be that there are significant boosts in speed with more recent ROCm versions - it's progressing at pace. And there might be other factors.
Well glad to hear there was a substantial difference. I can only assume that due to architecture differences, ROCM is substantially more effective than DirectML on more than capable cards. Where my experience with ROCM was hinging on it just barely supporting my GPU at the time before finally being properly deprecated.
 
Well glad to hear there was a substantial difference. I can only assume that due to architecture differences, ROCM is substantially more effective than DirectML on more than capable cards. Where my experience with ROCM was hinging on it just barely supporting my GPU at the time before finally being properly deprecated.
Very possible. You said your testing was a Radeon 570. I would be curious why there's such a difference between DirectML and ROCm but I suspect the reasons are rather arcane. In theory DirectML should be able to reach parity but who knows what the intricacies of these libraries entail. Well, @The Ugly One probably does but I don't! :)

Some shallow searching online shows people reporting 4it/s for similar parameters on a 4080 so still some way to go but good to know AMD is getting closer. I also realise just now that out of habit I was using non-GPU SDE rather than GPU. I guess with ROCm I might be able to use that without it blowing up now. I'll have to re-test. Regardless, I'm pretty pleased about all this. I could of course go back to SHARK which ran very quickly but it was limited in other ways.
 
Is there a good AI program that can swap faces on a picture?
 
kiwis for image to image in stable diffusion ui (vladmandic's one but w/e) how can I use a custom resolution image so I don't have to resize to 512x512 and distort things?
 
Anyone know what's up with sd-webui-animatediff not detecting models as of a few days ago? Some random commit seems to have broken it and I don't see anything in the diff/patchnotes about models needing to be in a different place. Was working fine beforehand and the models are still present in what I assume is the correct location.

E: Solved it. If anyone else run into this, make sure it didn't hose your paths in settings since they're configurable now.

Is there a good AI program that can swap faces on a picture?
Get any fork of automatic1111 (I suggest Forge) and install sd-webui-roop from the extensions browser.

kiwis for image to image in stable diffusion ui (vladmandic's one but w/e) how can I use a custom resolution image so I don't have to resize to 512x512 and distort things?
You can just adjust the resolution in the settings under where you supply the image, it gives you a nice little colored outline to show how well your source image fits your selected resolution too.
 
Last edited:
  • Informative
Reactions: Miller
I've been out of diffusers for a few months, is kohya-ss still the current package of choice for lora training or have we moved on to something better?
I think kohya-ss is still the main one everyone uses, but I'm personally using derrian-distro's LoRA Easy Training Scripts as it takes up considerably less space and the results are satisfactory to me.
 
  • Informative
Reactions: ducktales4gameboy
Sometimes you try and try and it becomes racist without trying.

Chinese Man dressed as woman making medicine in a bath;
_b1983e88-7fa0-4e62-ae3e-5800bc0110f7.jpeg_48c044bb-ed25-47d0-a503-456f963e77fd.jpeg_58cdf691-64ac-4685-a5c1-2ae1feee8277.jpeg_034aeb56-9961-4a28-a82c-536fd10d47ee.jpeg

Or man disguised as woman robbing a bank
_87ce5be7-8232-41f5-9f2d-6a105d425f38.jpeg
_391bc543-baf1-4105-baf8-3487c1dd0b68.jpeg

And a mixture of both prompts
_125f96fa-a94b-40eb-ba34-73a2832c6743.jpeg_4af0de36-c31f-475b-8627-a0b8ffe1d91c.jpeg
 
What's the current go-to setup for generating images on AMD? I haven't touched this stuff since mid 2023 and it seems the meta has changed quite a bit since then.
I've just started playing around with the ZLUDA backend in SD Next. The initial compile took around 20 minutes, but now I get around 3.5it/s on a 6900xt. Not groundbreaking, but it's the best performance I've been able to squeeze out of Windows.
 
  • Informative
Reactions: Spruce Supremacist
Anyone have any experience with SD Forge? It's supposed to boost performance.
I save a gig of VRAM and a second or two on basic generations, but some LORAs will randomly cause VRAM use to return to the old pre-Forge A1111 Webui levels.

Forge has also OOM'd and locked up my machine when switching models, something I haven't experienced with A1111 in half a year.

I haven't used any of Forge's bells and whistles, so maybe there's something in there that's a qualitative improvement. I just tested performance.

Try it out if you have very poor VRAM, but if things are already hunky dory it seems like a marginal improvement overall.
 
It's a huge improvement in some areas, especially if you have a recent GPU (I can do 2560x1440 on my 4070 with it where vanilla can barely hit 1080p) but a lot of the 'custom' stuff seems broken/incomplete. Animatediff's controlnet interaction specifically seems screwy and doesn't seem to work right. Haven't played with any of the other features.

Other than that the biggest difference I've seen using it is that Gradio doesn't randomly break and require pageloads to get the buttons to function again.
 
It's a huge improvement in some areas, especially if you have a recent GPU (I can do 2560x1440 on my 4070 with it where vanilla can barely hit 1080p)
Are you using xformers on vanilla? I'm not using SD Forge with a GTX 1080 and can render up to 2688x1512. I feel you really shouldn't be having issues rendering 1080p.
 
Back