Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
So here's something which I noticed in the SD3 API documentation

now these images don't seem that much better in quality than sdxl outputs to me, what are your thoughts on SD3 model so far? besides improved text handling and the Search and Replace feature does it seem to make better images or follow prompts closer than previous SD versions?
 
  • Informative
Reactions: IAmNotAlpharius
now these images don't seem that much better in quality than sdxl outputs to me, what are your thoughts on SD3 model so far? besides improved text handling and the Search and Replace feature does it seem to make better images or follow prompts closer than previous SD versions?
Eh, I think the quality is a little better considering this is the base model and the proper comparison would be to the basic SDXL model when released, prior to any community checkpoints. But regardless of that, it is definitely superior at following prompts. Not perfect, but noticeably better.
 
In my experience this is not the case. I get much better results by generating an image in common aspect ratios (eg. 832x1216, 1024x1024, 768x1344, etc) and then upscaling. Even base SDXL does not generate coherent images when you go significantly above those sizes. Just as an example, here's the Kim Possible example image, then the same parameters with no upscaling and a base image size of 1216x1792, then the same thing but base SDXL. It brings back the classic stretched torsos, duplicated body parts, etc. Also, if you look how LoRAs are trained, images are generally normalized to 1024x1024. I have done a few for fun, and the tooling will reduce a 2048x2048 to 1024x1024, 1664x2432 to 832x1216, etc.
We're both wrong. I misremembered the SDXL lower range since there's a model I use that strongly recommends 1024 and up; 832 is within the recommended range. 768 is under though, fyi. However the training data resolution doesn't matter in the way you think it does since there's a scaling stage in the pipeline.
In your case I think the issue with torsos is actually an aspect ratio thing: SDXL is especially optimised for square aspects and gets weirder the further you get from that. Sticking close to square gens is in the recommendations. Off the top of my head I'm not sure why lower res rectangular gens are more stable, but if it works it works.

I usually gen squares with XL but I've had some success with strong negatives for multiple girls/deformity/multiple torsos etc when that issue pops up btw. There's probably a few negative embeddings for XL on Civit that could help if you haven't tried that.
 
We're both wrong. I misremembered the SDXL lower range since there's a model I use that strongly recommends 1024 and up; 832 is within the recommended range. 768 is under though, fyi. However the training data resolution doesn't matter in the way you think it does since there's a scaling stage in the pipeline.
In your case I think the issue with torsos is actually an aspect ratio thing: SDXL is especially optimised for square aspects and gets weirder the further you get from that. Sticking close to square gens is in the recommendations. Off the top of my head I'm not sure why lower res rectangular gens are more stable, but if it works it works.

I usually gen squares with XL but I've had some success with strong negatives for multiple girls/deformity/multiple torsos etc when that issue pops up btw. There's probably a few negative embeddings for XL on Civit that could help if you haven't tried that.
I was talking about the scaling stage. Eg. here's output from the logs for a Lora I trained the other day, with sd-scripts (https://github.com/kohya-ss/sd-scripts):
bucket 0: resolution (768, 1280), count: 6
bucket 1: resolution (832, 1216), count: 75
bucket 2: resolution (896, 1152), count: 12
bucket 3: resolution (960, 1088), count: 12
bucket 4: resolution (1024, 1024), count: 60
bucket 5: resolution (1088, 960), count: 6
bucket 6: resolution (1152, 896), count: 4
bucket 7: resolution (1216, 832), count: 43
bucket 8: resolution (1280, 768), count: 6
It took all my images of various sizes and scaled them relative to 1024x1024 with buckets of (1024 + 64n, 1024 - 64n). Most community models are trained using this set of tools I believe, so it makes sense that models would perform better on the resolutions that they are specifically trained on.

Yeah, I'm not sure why lower res rectangular gens are more stable, but it's pretty easy to just add an upscaling step at the end. I often don't bother if I'm just testing stuff out, but it has the added benefit of cleaning up any small imperfections or artifacts.
 
I was talking about the scaling stage. Eg. here's output from the logs for a Lora I trained the other day, with sd-scripts (https://github.com/kohya-ss/sd-scripts):

It took all my images of various sizes and scaled them relative to 1024x1024 with buckets of (1024 + 64n, 1024 - 64n). Most community models are trained using this set of tools I believe, so it makes sense that models would perform better on the resolutions that they are specifically trained on.

Yeah, I'm not sure why lower res rectangular gens are more stable, but it's pretty easy to just add an upscaling step at the end. I often don't bother if I'm just testing stuff out, but it has the added benefit of cleaning up any small imperfections or artifacts.
Nah, you're talking about ordinary resolution normalisation during data preprocessing; I was referring to something in the actual SD process (autoencoder decoding).
 
Screenshot 2024-05-02 130304.jpg
Milhouse is just the kid from Wonder Years who may or may not be Marilyn Manson's staid alter ego. What kind of rip is that. Toss that AI into the woodchiper; back to the darwin board.
 

I'd love to get my hands on Sora but there's no public version let alone a local version where you can do whatever. Does anyone know of any alternatives?
 
The characters are already moving slightly.
thats the new thing in AI videos, have them move slightly so it looks impressive. Like the other guy said though its obvious a lot of times the "inspiration" behind certain actors. the simpsons is the best one but they have plenty of videos where they "1950s cinemascope" an IP
 
As others have stated, the subtle movements and zoom-ins to make the characters seem "alive" is really cheap and stupid. However I think this highlights a great use for ML image generation, which is concept prototyping. The imagery it spewed out is a pretty neat way to visualize "Simpsons but real", and for example it can be used by movie studios to easily figure out how should a proper live-action Simpsons movie look like.

Of course with time it might get so good that they'll just auto pump out slop to sell, but if it doesn't, this use case is still viable.
 
Anything as good as bing/copilot but not as cucked?

Also I'm trying (again) to get into Ai art. Any good setup guide/training guide? (I've a better rig now)
 
Anything as good as bing/copilot but not as cucked?
What do you want from the chatbot? There are a ton of Illama 2 finetunes with 13B GGUFs that may suit your needs. Although Illama 3 just came out which will probably be fine-tuned in the new few months.

Also I'm trying (again) to get into Ai art. Any good setup guide/training guide? (I've a better rig now)
You can't go wrong with https://github.com/AUTOMATIC1111/stable-diffusion-webui

As for models that's up to you. SDXL is the latest thing you can run locally until SD 3 comes out. There are also fine-tuned checkpoints on Civitai like Pony Diffusion v6 which is good for anime and other things... If you want to do training or fine-tuning, start with SD 1.5 and skip SD 2. There are plenty of guides.
 
What do you want from the chatbot? There are a ton of Illama 2 finetunes with 13B GGUFs that may suit your needs. Although Illama 3 just came out which will probably be fine-tuned in the new few months.


You can't go wrong with https://github.com/AUTOMATIC1111/stable-diffusion-webui

As for models that's up to you. SDXL is the latest thing you can run locally until SD 3 comes out. There are also fine-tuned checkpoints on Civitai like Pony Diffusion v6 which is good for anime and other things... If you want to do training or fine-tuning, start with SD 1.5 and skip SD 2. There are plenty of guides.
That's a good start.

I just want to generate stuff like this without bing censor going "ERROR! ERROR! NON-AMORPHOUS GENDERBLOB DETECTED! PROMT BLOCKED!"

1714978711136.png
 
Back