Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Little tip to get around censorship if you use online generators: type the censored words in a different language.

French works well, which is curious when you think about it; you would assume the French words would still be used by native English speakers in some pretentious circles, like in cuisine, art or fashion, but the AI will draw them regardless.
 
This reminds me, does anyone know anything about PonyXL drama? I recall hearing that the model's developer obfuscated a bunch of tags, and also won't share those obfuscated tags with people, but I expect I'm getting the technical details wrong.
they specifically obfuscated artist tags. it turns out that the model actually knows quite a lot of artists and styles by default without needing LORAs. not to worry, the dedicated gooners on 4chan have been mining them and there's a spreadsheet: https://lite.framacalc.org/4ttgzvd0rx-a6jf

it's for porn obviously so the sample links are all NSFW
 
they specifically obfuscated artist tags. it turns out that the model actually knows quite a lot of artists and styles by default without needing LORAs. not to worry, the dedicated gooners on 4chan have been mining them and there's a spreadsheet: https://lite.framacalc.org/4ttgzvd0rx-a6jf

it's for porn obviously so the sample links are all NSFW
Seems kind of bad, but I can understand why. I've trained LORAs that can create work that looks almost indistinguishable from the artists they are based on. It is probably discouraging for this to happen to artists who are good. So obfuscating tags makes sense to encourage artists to produce more content. I don't know why they wouldn't just remove those tags though.
 
  • Agree
Reactions: std::string
This honestly can happen in tons of software. Millions of dependencies, very little oversight what's even in them, all it takes is one github to be compromised or one developer with a hidden agenda. These are called supply chain attacks, not exactly what happened here but comfyui also has a rats nest of dependencies. I would always sandbox applications like Comfy. The open source community is far too trusting to run abritary code. The amount of times some anonymous, random literally who just links his github on reddit and says "hey people run this" and people actually just do is insane.
Very much this. I'd say there's a further problem with Stable Diffusion in that its user base are so much less technical than most projects with this level of rough and ready code. Obviously not all. There are technical users such as yourself who have good knowledge of the software, there are technical users such as myself who have good knowledge in general or our areas but not that much familiarity with the software (takes time to learn stuff even if you have the skills); and then you have the follow a guide post for help on reddit crowd which makes up a far larger proportion of this project than most. If someone wrote their own module for Pulumi and stuck it on github, the only people at risk would be people who are equipped to look at it and go "that's not right". It's not 100% guaranteed they will but the community would be a Hell of a lot more resistant than a bunch of enthusiastic guide followers who just want to make booby elf women. (mostly).

Thankfully, places like Github are developing the tools to detect such malicious code in real time. Using AI! :) We're getting close to the point, I think, where someone uploads some blatantly malicious code and the site itself flags it and raises concerns. It wont be foolproof but it can catch a lot of the low-hanging fruit. (Of course that could just create over-confidence on the part of users who then trust the more sophisticated mal-packages, but hey....)

Anyway, in the "I read reddit so you don't have to" section:

ComfyUI support for SD3 just dropped:

We are (allegedly) one day from the 2B version of SD3 being released.

Someone linked this quite interesting article about colour bias in SDXL.

In short, he says that SDXL has a bias towards yellow due to an absence of blue in its training data, he shows how its used colour space is out of the bounds of what would be a normal colour space and then wrote a bunch of code so far as I can see dynamically corrects colours during generation. Lots of comparision images. He has an interactive demo where you can compare the impact of his different colour correction techniques to the unaltered image:

Note, this isn't generating an image in real time. He has made a matrix of 300-something possible combinations of the techniques at different stages, I think. So the images vary slightly in composition but it's enough for you to see the effect of colour correction. I found it pretty cool.

Finally, I think I found one of the worst posts on the Stable Diffusion subreddit:

First the guy starts drawing unexplained analogies between the trained resolution of SD3 and Composite vs. S-Video vs Component cables. Saying that something may be the same resolution but a better quality. The analogy makes little sense and is just trying to dodge the fact that SD3 doing 512x512 resolution isn't a bad thing. It is. Then you get guff about how there aer over 7,500 papers on Google Scholar that build on the SD model and how "all of this knowledge could be potentially transferred to newer newer architectures [sic]". The majority of that is citation farming and none of it is about SD3 specifically. Then a bunch of cope about how 2B isn't a "skimped model" because "if the 8B model is undertrained a much smaller model can outperform it". Well sure, IS that the case? And are you saying the SD3 8B version IS undertrained?

I don't know why I'm reporting on this post here other than that it annoyed me and it's perhaps the most perfect example of someone dressing up absolutely no information or insight in a bunch of high-flown language and logical fallacies and getting away with it I've ever seen. It's like the Jabberwocky scene from Better Off Ted in real life. But that's 90% of reddit I guess.

Anyway, imo, we ARE getting a lesser model by it being the 2B version. I suspect most of the stuff about "we want as many people as possible to be able to run it" is post-fact spin on the fact they're not ready yet with larger models. I base that on the fact that people serious about this should be able to get the hardware to run larger models or already have it. And the 512x512 resolution if this is finally substantiated, is just piddly and crap.

Nonetheless, I am keen to see SD3. My experience with it via the API shows a lot of potential, though I suspect that's a larger than 2B version.

Finally, apropos of nothing, this image I generated amused me as a great example of how generative AI can go down a wrong path.
supergirl_flying.png

I was playing around with what the ICBINP model was capable of as it has really impressed me with its realism. I wanted to give it something a little outre and asked it to generate an image of supergirl specifically flying high above a city with a view from above. As you can see, it started with the essentials but then its generative nature extended things to the ground and did things like add a shadow which is still attached to the ground. The end result is a weird and forced perspective on a 16m tall supergirl that still has realistic detail. Weird in all sorts of subtle ways.
 
SD is just such a waste of time for me. Same with LLMs. My ML setup in the basement hasn't been used for a month or two because I uninstalled all my ML stuff.
Is that because you don't have a specific goal and have just generally lost interest, or because you have a specific goal but it just isn't feasible right now?

Is there an offline model that plays nice with AMD?
I mean, using ROCm I am able to use any of the Stable Diffusion based models very effectively. With 20GB VRAM on my AMD card it blasts out 1024x1024 SDXL images fast enough that I can try something, see the results and then make changes based on them. I use ComfyUI to do so.

The catch is that ROCm on Windows is waaay behind. You need to do it on Linux. But the hardware side works well.
 
Well pardon the double post but nobody else has replied in the meantime and I feel this is significant. SD3 is now publically available. In theory not for another 15 minutes but I was able to download the weighted model and sample workflows for ComfyUI. So... back later with some locally produced images and comparisons, hopefully.

I also intend to do a bit of a like for like comparison with their API version which they let slip is in fact the 8B version, once I have got things sufficiently nailed down to ensure I'm not accidentally introducing some difference. So we'll get to see how much difference the missing 6B Parameter (man) makes.
 
Cool, keep us posted. If you can try some examples of the same prompts on other sd models to compare that would be neat.
 
My default filters for Civitai have been updated to have SD3 enabled automatically apparently.
I usually just search for SDXL stuff.
Guess I could try it.
 
Well, I'm glad I don't work for Stability AI today.

Cool, keep us posted. If you can try some examples of the same prompts on other sd models to compare that would be neat.
I can happily do you some comparisons between models, e.g. SDXL (though not much point in doing base, might as well pick a tuned model) and SD3. But that said, if you've seen Human Centipede then you already have an idea what SD3 is like.

Okay, marginally more seriously, it's okay but seems to have a real bias towards anime and cartoonish style and it seems to get its understanding of human anatomy from John Carpenter's The Thing. Maybe fine tunes will lead to more impressive results. Thing is, the new commercial licence has already ruffled feathers. The PonyXL people have already said it makes it impossible for them to do a Pony version based on it. I don't use PonyXL because it seems to be largely focused around porn (unless I'm wrong) but if that's indicative of other players it could be a problem.

The API version I was using was and is noticeably better than this. And they confirmed that the API is running the 8B version, not the 2B they've just released.
 
  • Like
Reactions: inception_state
Yeah, people seem very unimpressed with SD 3. Reddit is full of people posting messed up anatomy and bad gens, it seems to have a significantly worse understanding of anatomy than SDXL.

he PonyXL people have already said it makes it impossible for them to do a Pony version based on it. I don't use PonyXL because it seems to be largely focused around porn (unless I'm wrong) but if that's indicative of other players it could be a problem.
It does porn, but it also has very good danbooru-style tagging and a bunch of baked-in artist styles for 2D, anime, cartoons, etc. It has basically cornered the market on that segment for SDXL, especially since people started training LORAs specifically for Pony and Civitai added it as a base model in their searching/filtering system.

Here's the article discussing the licensing issues: https://civitai.com/articles/5671
 
Yeah, people seem very unimpressed with SD 3. Reddit is full of people posting messed up anatomy and bad gens, it seems to have a significantly worse understanding of anatomy than SDXL.
My normal view of that subreddit is whiny entitled people who want everything for free and yesterday. At present however, I feel they may have a point. I'll still give it time to see how it turns out. It's definitely better in some ways so it might be a better base for building higher in time. At present I feel they made a mistake in not releasing the 8B version. Having that out there would go a long way to deflating some of the criticisms as I don't think it's got anywhere near the same issues.

It does porn, but it also has very good danbooru-style tagging and a bunch of baked-in artist styles for 2D, anime, cartoons, etc. It has basically cornered the market on that segment for SDXL, especially since people started training LORAs specifically for Pony and Civitai added it as a base model in their searching/filtering system.
Maybe I'll grab it then. I've veered away from the more prurient models but if it's more general purpose than I thought maybe I will. The thing is that independent of that either way, it's a very, very successful family of models. For it to ditch the possibility of working on SD3 is potentially a big deal in terms of community engagement. There was also the use of the term "Dunning-Kruger" in conversation with one of the PonyXL people from someone at SAI. It's not looking good. Basically they'd be limited to 6,000 generated images per month under the licence. Which is nowhere near viable.

Here's the article discussing the licensing issues: https://civitai.com/articles/5671
Thanks. Will read - will probably improve my understanding as it's not really something I have followed.
 
  • Like
Reactions: inception_state
So is SD3 supposed to be less resource-intensive than Cascade or XL?
I don't know about Cascade. I only dabbled with that and it went like shit off a shovel (Bong expression meaning very fast). But for XL I was finding it around in the same as SDXL models on like hardware and resolutions. I could do some more formal comparisons if people like. The model itself is only 4GB. Base SDXL without the refiner is 6GB. But SD3 is structured differently with parts of the text encoding separated out. If you roll everything in together it ends up around 10GB. But pure file size or pure parameter number comparisons aren't a great guide to complexity and performance. I think the fact that SD3 is split up into smaller parts can make it quicker. I have 20GB VRAM though so I can just throw everything into that with room to spare. I'd be interested to see if it's more performant when you have more constrained hardware - I suspect it might be.
 
  • Thunk-Provoking
Reactions: Baraadmirer
At present however, I feel they may have a point. I'll still give it time to see how it turns out. It's definitely better in some ways so it might be a better base for building higher in time. At present I feel they made a mistake in not releasing the 8B version. Having that out there would go a long way to deflating some of the criticisms as I don't think it's got anywhere near the same issues.
Looks like a disaster so far, and you could easily imagine the same issues extending to the 8 billion parameter model if the problem is that they attempted to make SD3 "safe". The coomers will probably end up being the ones to salvage it with retraining.
Ridiculed Stable Diffusion 3 release excels at AI-generated body horror (archive)
AI image fans are so far blaming the Stable Diffusion 3's anatomy failures on Stability's insistence on filtering out adult content (often called "NSFW" content) from the SD3 training data that teaches the model how to generate images. "Believe it or not, heavily censoring a model also gets rid of human anatomy, so... that's what happened," wrote one Reddit user in the thread.
 
Anyone have experience with LumaAI and any tips for prompts to make an image move?
 
Seems kind of bad, but I can understand why. I've trained LORAs that can create work that looks almost indistinguishable from the artists they are based on. It is probably discouraging for this to happen to artists who are good. So obfuscating tags makes sense to encourage artists to produce more content. I don't know why they wouldn't just remove those tags though.
The fun thing is that with a lot of these checkpoints you'll just get garbage unless you're trying to generate something in a specific artist's style. They've just been loaded up with so much noise that you need to clamp down on a specific tag to get something coherent.
 
  • Agree
Reactions: indigoisviolet
Back