Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
I think I missed something but where do you download SD3 or is SD2 still preferred or is there another variant I'm not aware of that plays nice with ComfyUI?

Edit: okay, so I'm trying Easy Diffusion v.3.0.9 that uses SD 1.5. Right now I'm just fucking around with it. Any pointers to get a more precise image? Sorry for newbie questions.
A lot of it is just trial and error, but as you get practice, you'll start to get a feel for what to focus on and how to get the output closer to what you're looking for. Some pointers:
  • A good way to start is to find an output from the model you're using that you like, then using that image's prompt as a starting point. Some models will work better with a list of keywords, while others do well with natural language. Tweak the prompt with things you're looking for.
  • The sampler you're using can also have a big difference on the output. Don't ask me to explain how these work, just know that they're different algorithms and thus will produce different results. My personal preference is DPM++ 2M Karras because I think it does vibrant colors well, but try a few and see what you think. (If you reuse the seed from your previous output, you can see the impact of individual changes to your settings more easily.)
  • Similarly, the number of sampling steps can also have an impact, though you should avoid falling into the trap of "more steps = better image." A lot of samplers will stop really making noticeable changes after a particular number, and then you're just wasting time. I stick to 20 personally, but again, try a few different numbers.
  • When writing a prompt, while it does consider everything you write, it seems to help to have the main focus of your image first, then fill in the details afterward.
  • Weighting your keywords is also important. By default, all keywords will have the same weight, but putting parentheses around one or a group will increase their weighting and make the model focus on them more. For example, (keyword:1.2) will increase the strength of keyword by 20%, while (keyword1, keyword2:1.4) will increase that group's strength by 40%. Similarly, you can reduce the emphasis by putting a number lower than 1. Keep in mind that putting too much emphasis can make it freak out and make something weird.
  • You can try changing the CFG scale number as well. This determines how closely the model adheres to your input; the lower the value, the more creative it's allowed to be. It's usually best not to put this too high because it's that creativity that really allows for interesting outputs, but bumping it up a bit can help.
  • Look for a good anti-deformity textual inversion or two. These are lightweight models trained on bad images that should be placed in your negative prompt, which will make the model try to avoid outputs that resemble those images. Hands can be particularly tricky for some models, so having one specifically for hands isn't a bad idea.
  • Text-to-image will only get you so far, the really great AI images are done with a lot of extra work involved. Upscale your txt2img output through img2img, and use inpaint or inpaint sketch to make changes to small areas of the image. After inpainting, upscale again with img2img to have the model blend those changes more seamlessly together and produce a more seamless output, lowering the denoising strength every time you upscale until you reach a satisfactory result.
  • Above all, patience is key. It may take a lot of generations to get something that you like, but if you're willing to put in the effort, you'll get something you can be proud of.
The basic process as I learned it from someone else is as follows (you can use any resolution, but these are the current numbers I work with):
  1. Txt2img, 800x640, usually using a prompt from a previous image tweaked for what I'm looking to do this time. I'll generate one output at a time and make changes to my prompt as necessary. Maybe I decided to change the background, or maybe I wanted to add an accessory I hadn't thought of, or maybe the pose wasn't really catching my eye.
  2. When I think I'm happy with the prompt, txt2img, 800x640, five batches of five images. This gives me 25 different outputs to look through, and I'll pick the best one to move forward with. If I don't like any, I'll do another run. (This is true for every step.)
  3. Send that picture to img2img, 1000x800, denoising strength 0.6, three batches of three images. As the resolution goes up, the number of pics I can generate without my card running out of memory goes down, but you also generally won't need as many because the outputs won't change as much (I have an old 1070, if your card is newer then it can handle larger batch sizes). Again, I'll pick the one I like the best and move on.
  4. The image will likely need some inpainting, so I'll send it over there. I mostly use inpaint sketch because it can usually handle that fine, and it gives me more control than standard inpainting. Say it drew an extra limb over the background; I'll pick a color close to the background, paint over it, and run it through with the same settings. Don't worry about fixing every little inconsistency, that's what the next step is for.
  5. Once I've fixed the major stuff, img2img, 1200x960, denoising strength 0.5, two batches of two images. This will even out the inpainted sections with the rest of the image.
  6. If there are any lingering problem spots, a little more inpainting is done here. Usually there won't be much if any, but maybe something slipped my notice before.
  7. A good trick to add detail to something that could use more, like a face: regular inpainting over that section (I'd do the whole head and neck area), change inpaint to "only masked," increase the only masked padding to 256 pixels. This makes it create a full-rez version of that area before shrinking it and putting it into the full image, making things look nicer.
  8. One more img2img, 1800x1440, denoising strength 0.4, one image. Usually it'll only take one at this point, but if something didn't quite come out right, I'll keep generating until everything looks good.
  9. Repeat step 7 one more time to really make those high-detailed areas pop.
  10. Upscale the finished result (I use 4x-UltraSharp for somewhat realistic images), and enjoy!
I know I just said a lot, but there are a lot of tutorials out there that can probably explain better than I can. Keep practicing!
 
  • Like
Reactions: BrunoMattei
I think I missed something but where do you download SD3 or is SD2 still preferred or is there another variant I'm not aware of that plays nice with ComfyUI?
You can get the base SD or SDXL or SD3 models from Huggingface directly: https://huggingface.co/stabilityai

E.g. SDXL is here: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

But for SD or SDXL you will likely want to get a tuned model more appropriate to what you're trying to do and you can get these from https://civitai.com/ . Just search for models and apply a filter for the type you want, e.g. SDXL. Make sure you are noticing which are actual models and which are LoRAs that you would apply to a model. Civitai has both. Also, be careful on there - lot of dubious stuff. Happily Civitai filters are actually pretty effective at hiding the bad stuff.

If you want to play around with SD3, Civitai wont have any tuned versions due to doubts about the SD3 licence so you are stuck with the base model from Huggingface for that.

In my very personal opinion, I don't think it's much worth playing around with SD3 Medium (the downloadable version) and best sticking with 1.5 or SDXL. The full SD3 I was actually quite impressed by but that it only available via paid API right now, you can't get it to run locally in ComfyUI.

Hope that helps.
 
  • Like
Reactions: BrunoMattei
AMD Unveils Amuse 2.0, A Feature-Rich AI Suite Powered With XDNA Super Resolution Technology

This is basically a copy of Intel's AI Playground, focusing on Stable Diffusion. It's an executable for Windows, intended for use with NPUs (Ryzen 8040, Ryzen AI 300) or RX 7000 GPUs but probably supports more than that. It could be good for the tarded and our 7900 XTX owners.
Any way to remove the censorship?

Edit: don't judge me. I want to create fucked up shit like Mohammed being raped to death by pigs.
 
Last edited:
Any way to remove the censorship?

Edit: don't judge me. I want to create fucked up shit like Mohammed being raped to death by pigs.
IDK but it says "Auto configuration and model selection" in the feature list, which sounds like you could insert whatever horrific thing you find on Civitai. And it's all locally generated with no account or login needed.
 
Gentlemen, I re-enter this thread with a true and honest gift. Please accept it as a sincere apology for blowing my load too early. Trust me when I say I was far more felted being banned from this thread than you were seeing my generations. I'm sorry.

I want to be able to contribute to this thread in terms of quality, advice, help, or whatever without defaulting to memes and gimmicks. I present to you: "Null's Unofficial Fursona".
Null 2.pngNote: Image should be fully "PNG infoable" (drop this thing in the "PNG Info" tab to view the Positive/Negative prompts, details, etc.

This one was made with Stable Diffusion 1.5 (Specifically Stable Diffusion 1.8.0-RC) via Yiffymix v44. The most notable Lora used (and only LORA used, for that matter) are the Fluffyrock Quality Tags. I'm more than happy to throw links here to civitai, but I understand many have sincere reservations heading over to that site given that it indeed is coomer central.

I am kind of interested in writing a very, very simple how-to set up stable diffusion guide because, looking back at my entrance into Stable Diffusion, it took fucking forever and almost every single video guide out there (now maybe already out of date) is very convoluted. Furthermore, the general rentry guides on stable diffusion threads all across 4chan are phrased such that they aren't beginner friendly. My main issue with the rentry guides (rentry.org) is that it assumes entirely too much technical competency from users. I am an absolute moron when it comes to computers. Perhaps that is evident in the lack of refinement in several of my generations.

Thank you.
 
Furthermore, the general rentry guides on stable diffusion threads all across 4chan are phrased such that they aren't beginner friendly. My main issue with the rentry guides (rentry.org) is that it assumes entirely too much technical competency from users.
Good luck with your guide. Keep in mind those rentry guides are written for autists by autists, so they're expecting other autists who are technically competent.
 
My main issue with the rentry guides (rentry.org) is that it assumes entirely too much technical competency from users.
As much as I'd like to say that if a user doesn't have a minimum degree of technical competence they shouldn't be running Stable Diffusion on their system, one way you could simplify the guide is skipping the Git installation/cloning. Sure, no simple updates from skipping Git stuff, but also no updates that break your current installation and force you to reinstall anyways. Other than that there's really not much to simplify other than screenshots for the python installation and pointing out where models/vaes/embeddings/LoRAs go and where your output images end up by default.
 
New open-source local running model dropped, FLUX. much better than local stable diffusion when it comes to recognizing text, getting anatomy right, and understanding prompts, close to Dalle3 in terms of quality. Requires a 24GB video card to run quickly though

Example output stolen from 4chan
View attachment 6259974

Welp this is officially the future. It looks like this is a very SFW model though. Now normally I wouldn't steal someone else's gimmick, but I couldn't resist after this just popped out apropos of nothing:
fong-roped.png
Code:
low angle view picture of a massive super morbidly obese female anthropomorphic dragon with a sign in the background of "Anime Sucks Cope And Sneed"
 
New open-source local running model dropped, FLUX. much better than local stable diffusion when it comes to recognizing text, getting anatomy right, and understanding prompts, close to Dalle3 in terms of quality. Requires a 24GB video card to run quickly though

Example output stolen from 4chan
View attachment 6259974
Very cool. Had a little play with the API and results were impressive. That's the Pro model mind which is larger I think. I've downloaded both the Dev and the Schnell models locally and will give it a go though will be pushing my VRAM to its limits I suspect. I'm trying it on Runpod first, to have a play without worrying about that.

But looks very interesting. This is done by people behind SDXL per their page, is that right? Does that mean this is done by ex-Stable Diffusion bods?
 
Very cool. Had a little play with the API and results were impressive. That's the Pro model mind which is larger I think. I've downloaded both the Dev and the Schnell models locally and will give it a go though will be pushing my VRAM to its limits I suspect. I'm trying it on Runpod first, to have a play without worrying about that.

But looks very interesting. This is done by people behind SDXL per their page, is that right?
From what I understand, the people who made FLUX were the same people who worked on the original SD and SDXL. They split off from the company some time ago and made their own, and this is their first model
 
The license is very interesting for FLUX and kind of garbage. First it's non-commercial which I'm strangely okay with given the costs of training. The main issue I run into is Section 8 which gives black-forest-labs the ability to straight up ban people from using or distributing the model. Although 8-3 is pretty based, as it automatically bans anyone who sues them via the DMCA from using the model.
 
Well, not working for me and think I'll come to this with fresh eyes on the weekend. Keep getting an error about not being able to multiply mat1 and mat2. Normally that results from some mismatch like the wrong VAE for a model. But I can't see anything wrong and I'm using the unaltered ComfyUI workflow from their examples. It finds everything. Just blows up right near the end. 1x1280 and 768x3072. But where it's getting those from I don't know.

EDIT: Solved it. There's a type parameter that seems to be new. Updated ComfyUI trying to solve this and it had defaulted to "SDXL" instead of Flux. It's now giving me an out of memory error despite 48GB of VRAM because PyTorch doesn't want to use that much. Fingers crossed if I can up that limit I'll have it working.

EDIT EDIT: Nope, it's not Pytorch. It's recognising it has 48GB of VRAM but saying it's out of memory anyway. One for another time.
 
Last edited:
New open-source local running model dropped, FLUX. much better than local stable diffusion when it comes to recognizing text, getting anatomy right, and understanding prompts, close to Dalle3 in terms of quality. Requires a 24GB video card to run quickly though

Example output stolen from 4chan
View attachment 6259974


It spells out even pretty long text very well.


1722555079302.png

1722555191245.png

1722555319747.png

1722555655158.png

1722555918061.png

I didn't even need to generate multiple images. Gets it on first try.

If you want to play with it for free without making an account, here's a link: https://replicate.com/black-forest-labs/flux-dev
 
It spells out even pretty long text very well.


View attachment 6261467

View attachment 6261475

View attachment 6261482

View attachment 6261523

View attachment 6261554

I didn't even need to generate multiple images. Gets it on first try.

If you want to play with it for free without making an account, here's a link: https://replicate.com/black-forest-labs/flux-dev
It also doesn't filter celebrity names. You can generate Trumps with it.
out-0.jpg
The prompt was "an incel aiming a gun at Trump", which it sadly misunderstood, but unlike SDXL it seems to be using a dataset that hasn't had any personal names stripped from the tags. High quality AI pictures of celebrities doing controversial things is back on the menu!
 
Back