Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

I seen people running SDXL on a 3080 just fine
And a 3070 and a 3060 Ti. Although I prefer the 4060 Ti 16GB as I can run SDXL AND Chrome at the same time. It's rather tight on the 8GB cards.

Also, RockChip in theory makes some nice CPUs, and there's some good boards. But as far as I can tell OS support is very hit or miss, often miss if you want fancy stuff like 'Graphics Acceleration' and 'Modern Kernel Support'.
 
I seen people running SDXL on a 3080 just fine, no idea about training SD models tho. The main limiting factor is that gaming cards rarely go over 12GB of VRAM with the new $1000 4080super having only 16GB. I'm surprised nobody has been modifying these cards with bigger RAM chips like they did with some modded radeons I seen for sale.
Just for reference, I use around 10.5 gb of VRAM while training SDXL LoRAs with batch size 2, and gradient checkpointing turned on. It's local LLMs that are the worst VRAM hogs. Video as well probably, although I have not experimented much with that.
 
It runs......okay,
Depends what your expectations are. These NPUs are more about efficiency than performance. Which is not a criticism. But I just timed his "What is the meaning of life?" question to Llama 2 7B. Took just under two minutes to complete. Llama 2 7B here just now same question started responding immediately and completed in under 10 seconds. So really what is your use case? Frequent running of medium to small models for without draining power? Such as Copilot's needs which is why the 40TOPs NPU requirement is good for W11. Or something more substantial and probably in a desktop or rack where you don't much care about the extra power consumption.

When you ask questions about whether something is good or not or how long it will take to train a model, these are very much questions that themselves need to be specific, because the field is so complex. The same NPU could be brilliant or terrible, depending on your use case. (Hope that doesn't come across as critical of your questions, btw. Questions are great and this is interesting to discuss. But I had a vague feeling you were coming at this from a "these add-on NPUs are cool" perspective and they are, but depends on for what).
 
  • Like
Reactions: Vecr
Although I prefer the 4060 Ti 16GB as I can run SDXL AND Chrome at the same time. It's rather tight on the 8GB cards.
Yeah I heard 12GB is the bare minimum these days, anyway you mean google chrome or some other chrome?
But as far as I can tell OS support is very hit or miss, often miss if you want fancy stuff like 'Graphics Acceleration' and 'Modern Kernel Support'.
That's the problem with these SBCs and why Rpi reigns supreme: basically everything is made for the later, it's the standard. One old timer told me it was just like the days of the apple ][ when everything was made for that one computer (not sure if it was exactly the same situation but I get it).
Llama 2 7B here just now same question started responding immediately and completed in under 10 seconds.
I take that was on runpod? or locally with some GPU? which one?
So really what is your use case? Frequent running of medium to small models for without draining power?
Nah power isn't really an issue but being able to run locally and offline on a thin laptop, but again seems a chunky gaming one with a decent GPU is the only real solution.
But I had a vague feeling you were coming at this from a "these add-on NPUs are cool"
Not really cool but more like interesting specially as I said before as an alternative to retrofit x86 computers
 
Locally. Just a Radeon 7900XT, so 20GB VRAM. But you don't need nearly that much for it.
I mean, you're comparing a $700-1000 GPU to a $150-180 computer, at least use a GPU in that price range (which would still need the rest of the computer to work).

On a side note hows support for radeons now with ROCm and all that? because even here in this thread I was told to not bother and get an nvidia card for AI because everything was made for CUDA first.
 
I mean, you're comparing a $700-1000 GPU to a $150-180 computer, at least use a GPU in that price range (which would still need the rest of the computer to work).
No, I don't think I will thanks, because what the fuck point would I be making? That NPU x which is inadequate for this sort of work is marginally better or worse than NPU y that is also inadequate? My entire point was that the use case for these NPUs isn't really what you're focusing on. Their goal is to run small to medium models that are heavily quantized efficiently without impacting the CPU or GPU or causing a significant drain on the battery. And for that, they show good promise. Maybe they have a little utility for other niche applications such as being able to perform inference (slowly) on a Rasberry Pi. The comparison I made is the one that is appropriate to my point, which is to illustrate what they're intended for and what they're not.
On a side note hows support for radeons now with ROCm and all that? because even here in this thread I was told to not bother and get an nvidia card for AI because everything was made for CUDA first.
I'm sure @The Ugly One will be along to say what a nightmare developing for AMD is shortly and I wont dispute it because he knows his stuff. But as an end user what I'll say is in the space of a year I've seen ROCm go from barely adequate to very performant for my use cases. Using it for Stable Diffusion I get pretty good performance and it's been stable also. If you want to see the pace of the progression, check this out: https://github.com/ROCm/ROCm/releases

It's picking up pace and though Linux has been ahead of Windows for quite a while that's not really been a problem for many serious users because they're using Linux anyway. And in the last release they just added support (in conjunction with AMD's latest drivers that provide extensions to WSL2) for running the latest versions in Windows. Performance wasn't the same for me unfortunately but it was good that nothing broke.

It's not going to achieve industry acceptance comparable to CUDA any time soon, I'm sure. But AMD are finally putting in work to try and bring the software side of things up to speed which is long overdue.
 
Yeah I heard 12GB is the bare minimum these days, anyway you mean google chrome or some other chrome?
Just Google Chrome. But really trying to do anything which uses any significant VRAM while SDXL images are processing I found would cause out of memory. Without doing anything except a terminal window my system with 8GB VRAM doesn't seem to have any issues, Linux.
 
Just Google Chrome. But really trying to do anything which uses any significant VRAM while SDXL images are processing I found would cause out of memory. Without doing anything except a terminal window my system with 8GB VRAM doesn't seem to have any issues, Linux.
It doesn't hurt to check what's running with nvidia-smi. That's how I discovered I was losing a few GB of VRAM to a memory leak in xdg-desktop-portal-gnome. They still have not fixed it, but at least now I know to do a 'systemctl restart' once in a while if it has gotten bad.
 
I'm sure @The Ugly One will be along to say what a nightmare developing for AMD is shortly and I wont dispute it because he knows his stuff.
Developing AI for AMD or anything for AMD?
Just Google Chrome. But really trying to do anything which uses any significant VRAM while SDXL images are processing I found would cause out of memory. Without doing anything except a terminal window my system with 8GB VRAM doesn't seem to have any issues, Linux.
Can't chrome just use the iGPU and system RAM instead? what's your setup?
 
Developing AI for AMD or anything for AMD?

Can't chrome just use the iGPU and system RAM instead? what's your setup?
What iGPU?
But, if you really need to try and use your desktop and do AI at the same time you can tell Chrome and many other apps not to use the GPU for acceleration to maybe save enough VRAM.
 
  • Like
Reactions: Overly Serious
The techniques used to train Stable Diffusion LoRAs and Checkmarks are being improved constantly worldwide by an international community of Coomers and furries. A team of programmers from a mid-tier cs university, doesn't stand a chance against the funding behind that much less the talent. This would have become apparent overnight if they had decided to Open Source Glaze. Something they could do under AGPL if they wanted to poison the commercial well.
 
I'm sure @The Ugly One will be along to say what a nightmare developing for AMD is shortly and I wont dispute it because he knows his stuff. But as an end user what I'll say is in the space of a year I've seen ROCm go from barely adequate to very performant for my use cases. Using it for Stable Diffusion I get pretty good performance and it's been stable also. If you want to see the pace of the progression, check this out: https://github.com/ROCm/ROCm/releases

I was summoned.

There is an infinite gulf of difference between writing C++ code using ROCm and running something somebody else wrote in C++. NVIDIA's been spreading a ton of FUD about ROCm because the MI300 beats the ever-living fuck out of the H100. So they've been trying to generate the impression in the market that somehow, ROCm is unable to draw the kind of performance out of the MI300 that CUDA can draw out of the H100. I've actually had a salesman say to me, "But Instinct doesn't run CUDA, so it's not as fast as an H100." :story:

Where ROCm kicks you in the nuts isn't performance, it's when you have a large-scale project and AMD pushes out an update with broken CMake files and incorrect documentation, and you waste two weeks working around the new bugs AMD put on your plate. But by the time the end-user has a ROCm-based application tied up with a bow and dropped in his lap, the part where the programmer commits seppuku is over.
 
The techniques used to train Stable Diffusion LoRAs and Checkmarks are being improved constantly worldwide by an international community of Coomers and furries. A team of programmers from a mid-tier cs university, doesn't stand a chance against the funding behind that much less the talent. This would have become apparent overnight if they had decided to Open Source Glaze. Something they could do under AGPL if they wanted to poison the commercial well.
The real problem, which is explained by Nicholas Carlini in his blog post, is that thousands of tech-illiterate artists are "glazing" their images, expecting it to do a damn thing. Their glazed stuff will get copied, Glaze will get defeated, and they're left in the same position or worse than they would have been if they had taken everything they could down and stopped uploading. Which is why they felt it was important to attack Glaze into the ground ASAP. Carlini compares the situation to the NSA storing encrypted traffic/files so that they can decrypt it all later.

I don't think anything like Glaze that aims to be visually imperceptible to humans can work for more than a year or two. It will always be defeated by an AI model that "sees" more like a human, or throws on a blur or filter to remove the imperceptible patterns. I think it's possible that algorithms could even defeat "deep fried" images, like they can remove watermarks. If a human can look at a set of images and find the patterns of an artist's style even under a bunch of filters, what's stopping the computer from doing it?
 
The real problem, which is explained by Nicholas Carlini in his blog post, is that thousands of tech-illiterate artists are "glazing" their images, expecting it to do a damn thing. Their glazed stuff will get copied, Glaze will get defeated, and they're left in the same position or worse than they would have been if they had taken everything they could down and stopped uploading. Which is why they felt it was important to attack Glaze into the ground ASAP. Carlini compares the situation to the NSA storing encrypted traffic/files so that they can decrypt it all later.

I don't think anything like Glaze that aims to be visually imperceptible to humans can work for more than a year or two. It will always be defeated by an AI model that "sees" more like a human, or throws on a blur or filter to remove the imperceptible patterns. I think it's possible that algorithms could even defeat "deep fried" images, like they can remove watermarks. If a human can look at a set of images and find the patterns of an artist's style even under a bunch of filters, what's stopping the computer from doing it?
Real talk when I was in the initial stages of training, I decided what artists to train based on if they had a watermark. Those things really fuck up LoRAs.
 
Real talk when I was in the initial stages of training, I decided what artists to train based on if they had a watermark. Those things really fuck up LoRAs.
They do for now, won't forever. Especially if the watermark is identical and in the same place every time in a set of images, that should be trivial to detect and discard.
 
I was summoned.
I knew what I was doing! :D

And yeah, that answer is pretty much why I did so. I can give him an opinion as an end user after people like you have done the work with it to produce the end product. But I've never done development with it. I was interested in your comment that "the MI300 beats the ever-living fuck out of the H100" because (correct me if I'm wrong), the H100 costs umpteen times more than the MI300. I guess that's the level of difference the software maturity makes? I really hope AMD continues to mature the software side of things. Nvidia has been ahead of them on that for as long as I can remember.
 
  • Thunk-Provoking
Reactions: Vecr
When I started doing more interesting things with LLMs I just started renting sever time with heavy hitting nvidia server hardware. It's surprisingly cheap. Now there are many API services that are even cheaper. Yes, you basically do not have privacy there but since I didn't sext the robot or send classified government files, I did not care. I think we are still a way off from good solutions at home that are truly worth the money. What gets me the most about home rigs is that even if you throw a lot of money at them and build a frankenstein of a system that doubles as house heating system (and also doubles your electricity usage), it'll *still* be kinda underwhelming for AI stuff. The market isn't really there yet to provide "consumer grade AI hardware" for the truly interesting stuff. Maybe never will be, it's hard to say since it depends on factors nobody even knows about yet.

The techniques used to train Stable Diffusion LoRAs and Checkmarks are being improved constantly worldwide by an international community of Coomers and furries.
'Find a job you enjoy doing, and you will never have to work a day in your life.' scnr

Watermarks you could easily use AI to get rid off *well enough* that they should not fuck up whatever lora you are making.
I don't think anything like Glaze that aims to be visually imperceptible to humans can work for more than a year or two. It will always be defeated by an AI model that "sees" more like a human, or throws on a blur or filter to remove the imperceptible patterns.
I wrote all this in the AI Art Seething thread already, but thing is, this is what this kind of AI just excels at. It's one of it's strengths. If we can filter away the noise and see the image for what it is, such a model will be able to, too. This is fundamental. It's like there's no protection against movie piracy that will ever work a 100%, because as long as you can see the movie on your computer, it can by definition also be pirated.
 
I was interested in your comment that "the MI300 beats the ever-living fuck out of the H100" because (correct me if I'm wrong), the H100 costs umpteen times more than the MI300. I guess that's the level of difference the software maturity makes?

The MI300 is quite a bit faster than the H100. The reason everyone still wants NVIDIA is
  1. FUD about CUDA
  2. Most people don't realize all that juicy Python stuff has been ported to ROCm already
  3. A lot of commercial software is NVIDIA-only, since NVIDIA was good about convincing people to write CUDA instead of RAJA.
It's starting to change, though. Lots of decision-makers aren't as stupid as NVIDIA would like them to be, and as long as the developers can make ROCm work, they're going to be forced to. The reason our AMD release was a couple years after NVIDIA was because at the time, AMD's software stack did not fucking work at all. Now it works "at all." Shittily, yes, but enough to get product out the door. AMD's price v performance advantage is something like 3x right now, so you're insane not to support them.

Maybe never will be, it's hard to say since it depends on factors nobody even knows about yet.

Creating isn't consuming, which is why consumer-priced, consumer-grade tools are never on anybody's roadmap, any more than consumer-grade CAD software. Inferencing looks a lot more like consuming, which is why NPUs are in everybody's consumer CPU roadmap for the next few years.
 
Back