Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

King Dead · Jul 22, 2024

BrunoMattei said:
I think I missed something but where do you download SD3 or is SD2 still preferred or is there another variant I'm not aware of that plays nice with ComfyUI?

Edit: okay, so I'm trying Easy Diffusion v.3.0.9 that uses SD 1.5. Right now I'm just fucking around with it. Any pointers to get a more precise image? Sorry for newbie questions.

A lot of it is just trial and error, but as you get practice, you'll start to get a feel for what to focus on and how to get the output closer to what you're looking for. Some pointers:

A good way to start is to find an output from the model you're using that you like, then using that image's prompt as a starting point. Some models will work better with a list of keywords, while others do well with natural language. Tweak the prompt with things you're looking for.
The sampler you're using can also have a big difference on the output. Don't ask me to explain how these work, just know that they're different algorithms and thus will produce different results. My personal preference is DPM++ 2M Karras because I think it does vibrant colors well, but try a few and see what you think. (If you reuse the seed from your previous output, you can see the impact of individual changes to your settings more easily.)
Similarly, the number of sampling steps can also have an impact, though you should avoid falling into the trap of "more steps = better image." A lot of samplers will stop really making noticeable changes after a particular number, and then you're just wasting time. I stick to 20 personally, but again, try a few different numbers.
When writing a prompt, while it does consider everything you write, it seems to help to have the main focus of your image first, then fill in the details afterward.
Weighting your keywords is also important. By default, all keywords will have the same weight, but putting parentheses around one or a group will increase their weighting and make the model focus on them more. For example, (keyword:1.2) will increase the strength of keyword by 20%, while (keyword1, keyword2:1.4) will increase that group's strength by 40%. Similarly, you can reduce the emphasis by putting a number lower than 1. Keep in mind that putting too much emphasis can make it freak out and make something weird.
You can try changing the CFG scale number as well. This determines how closely the model adheres to your input; the lower the value, the more creative it's allowed to be. It's usually best not to put this too high because it's that creativity that really allows for interesting outputs, but bumping it up a bit can help.
Look for a good anti-deformity textual inversion or two. These are lightweight models trained on bad images that should be placed in your negative prompt, which will make the model try to avoid outputs that resemble those images. Hands can be particularly tricky for some models, so having one specifically for hands isn't a bad idea.
Text-to-image will only get you so far, the really great AI images are done with a lot of extra work involved. Upscale your txt2img output through img2img, and use inpaint or inpaint sketch to make changes to small areas of the image. After inpainting, upscale again with img2img to have the model blend those changes more seamlessly together and produce a more seamless output, lowering the denoising strength every time you upscale until you reach a satisfactory result.
Above all, patience is key. It may take a lot of generations to get something that you like, but if you're willing to put in the effort, you'll get something you can be proud of.

The basic process as I learned it from someone else is as follows (you can use any resolution, but these are the current numbers I work with):

Txt2img, 800x640, usually using a prompt from a previous image tweaked for what I'm looking to do this time. I'll generate one output at a time and make changes to my prompt as necessary. Maybe I decided to change the background, or maybe I wanted to add an accessory I hadn't thought of, or maybe the pose wasn't really catching my eye.
When I think I'm happy with the prompt, txt2img, 800x640, five batches of five images. This gives me 25 different outputs to look through, and I'll pick the best one to move forward with. If I don't like any, I'll do another run. (This is true for every step.)
Send that picture to img2img, 1000x800, denoising strength 0.6, three batches of three images. As the resolution goes up, the number of pics I can generate without my card running out of memory goes down, but you also generally won't need as many because the outputs won't change as much (I have an old 1070, if your card is newer then it can handle larger batch sizes). Again, I'll pick the one I like the best and move on.
The image will likely need some inpainting, so I'll send it over there. I mostly use inpaint sketch because it can usually handle that fine, and it gives me more control than standard inpainting. Say it drew an extra limb over the background; I'll pick a color close to the background, paint over it, and run it through with the same settings. Don't worry about fixing every little inconsistency, that's what the next step is for.
Once I've fixed the major stuff, img2img, 1200x960, denoising strength 0.5, two batches of two images. This will even out the inpainted sections with the rest of the image.
If there are any lingering problem spots, a little more inpainting is done here. Usually there won't be much if any, but maybe something slipped my notice before.
A good trick to add detail to something that could use more, like a face: regular inpainting over that section (I'd do the whole head and neck area), change inpaint to "only masked," increase the only masked padding to 256 pixels. This makes it create a full-rez version of that area before shrinking it and putting it into the full image, making things look nicer.
One more img2img, 1800x1440, denoising strength 0.4, one image. Usually it'll only take one at this point, but if something didn't quite come out right, I'll keep generating until everything looks good.
Repeat step 7 one more time to really make those high-detailed areas pop.
Upscale the finished result (I use 4x-UltraSharp for somewhat realistic images), and enjoy!

I know I just said a lot, but there are a lot of tutorials out there that can probably explain better than I can. Keep practicing!

Overly Serious · Jul 22, 2024

BrunoMattei said:
I think I missed something but where do you download SD3 or is SD2 still preferred or is there another variant I'm not aware of that plays nice with ComfyUI?

You can get the base SD or SDXL or SD3 models from Huggingface directly: https://huggingface.co/stabilityai

E.g. SDXL is here: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

But for SD or SDXL you will likely want to get a tuned model more appropriate to what you're trying to do and you can get these from https://civitai.com/ . Just search for models and apply a filter for the type you want, e.g. SDXL. Make sure you are noticing which are actual models and which are LoRAs that you would apply to a model. Civitai has both. Also, be careful on there - lot of dubious stuff. Happily Civitai filters are actually pretty effective at hiding the bad stuff.

If you want to play around with SD3, Civitai wont have any tuned versions due to doubts about the SD3 licence so you are stuck with the base model from Huggingface for that.

In my very personal opinion, I don't think it's much worth playing around with SD3 Medium (the downloadable version) and best sticking with 1.5 or SDXL. The full SD3 I was actually quite impressed by but that it only available via paid API right now, you can't get it to run locally in ComfyUI.

Hope that helps.

macrodegenerate · Jul 22, 2024

SDXL/1.5 are the preferred distributions. PonyXL is the best if you are a coomer or anime goon.

The Mass Shooter Ron Soye · Jul 28, 2024

AMD Unveils Amuse 2.0, A Feature-Rich AI Suite Powered With XDNA Super Resolution Technology

https://community.amd.com/t5/ai/introducing-amuse-2-0-beta-with-amd-xdna-super-resolution-a/ba-p/697374

This is basically a copy of Intel's AI Playground, focusing on Stable Diffusion. It's an executable for Windows, intended for use with NPUs (Ryzen 8040, Ryzen AI 300) or RX 7000 GPUs but probably supports more than that. It could be good for the tarded and our 7900 XTX owners.

BrunoMattei · Jul 28, 2024

The Mass Shooter Ron Soye said:
AMD Unveils Amuse 2.0, A Feature-Rich AI Suite Powered With XDNA Super Resolution Technology

https://community.amd.com/t5/ai/introducing-amuse-2-0-beta-with-amd-xdna-super-resolution-a/ba-p/697374

This is basically a copy of Intel's AI Playground, focusing on Stable Diffusion. It's an executable for Windows, intended for use with NPUs (Ryzen 8040, Ryzen AI 300) or RX 7000 GPUs but probably supports more than that. It could be good for the tarded and our 7900 XTX owners.

Any way to remove the censorship?

Edit: don't judge me. I want to create fucked up shit like Mohammed being raped to death by pigs.

The Mass Shooter Ron Soye · Jul 28, 2024

BrunoMattei said:
Any way to remove the censorship?

Edit: don't judge me. I want to create fucked up shit like Mohammed being raped to death by pigs.

IDK but it says "Auto configuration and model selection" in the feature list, which sounds like you could insert whatever horrific thing you find on Civitai. And it's all locally generated with no account or login needed.

AnimeSucksCopeAndSneed · Jul 31, 2024

Gentlemen, I re-enter this thread with a true and honest gift. Please accept it as a sincere apology for blowing my load too early. Trust me when I say I was far more felted being banned from this thread than you were seeing my generations. I'm sorry.

I want to be able to contribute to this thread in terms of quality, advice, help, or whatever without defaulting to memes and gimmicks. I present to you: "Null's Unofficial Fursona".

Note: Image should be fully "PNG infoable" (drop this thing in the "PNG Info" tab to view the Positive/Negative prompts, details, etc.

This one was made with Stable Diffusion 1.5 (Specifically Stable Diffusion 1.8.0-RC) via Yiffymix v44. The most notable Lora used (and only LORA used, for that matter) are the Fluffyrock Quality Tags. I'm more than happy to throw links here to civitai, but I understand many have sincere reservations heading over to that site given that it indeed is coomer central.

I am kind of interested in writing a very, very simple how-to set up stable diffusion guide because, looking back at my entrance into Stable Diffusion, it took fucking forever and almost every single video guide out there (now maybe already out of date) is very convoluted. Furthermore, the general rentry guides on stable diffusion threads all across 4chan are phrased such that they aren't beginner friendly. My main issue with the rentry guides (rentry.org) is that it assumes entirely too much technical competency from users. I am an absolute moron when it comes to computers. Perhaps that is evident in the lack of refinement in several of my generations.

Thank you.

anustart76 · Aug 1, 2024

AnimeSucksCopeAndSneed said:
Furthermore, the general rentry guides on stable diffusion threads all across 4chan are phrased such that they aren't beginner friendly. My main issue with the rentry guides (rentry.org) is that it assumes entirely too much technical competency from users.

Good luck with your guide. Keep in mind those rentry guides are written for autists by autists, so they're expecting other autists who are technically competent.

Catgirls are Love · Aug 1, 2024

AnimeSucksCopeAndSneed said:
My main issue with the rentry guides (rentry.org) is that it assumes entirely too much technical competency from users.

As much as I'd like to say that if a user doesn't have a minimum degree of technical competence they shouldn't be running Stable Diffusion on their system, one way you could simplify the guide is skipping the Git installation/cloning. Sure, no simple updates from skipping Git stuff, but also no updates that break your current installation and force you to reinstall anyways. Other than that there's really not much to simplify other than screenshots for the python installation and pointing out where models/vaes/embeddings/LoRAs go and where your output images end up by default.

macrodegenerate · Aug 1, 2024

A cute picture of null

Margo Martindale · Aug 1, 2024

New open-source local running model dropped, FLUX. much better than local stable diffusion when it comes to recognizing text, getting anatomy right, and understanding prompts, close to Dalle3 in terms of quality. Requires a 24GB video card to run quickly though

https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell

https://comfyanonymous.github.io/ComfyUI_examples/flux

Example output stolen from 4chan

macrodegenerate · Aug 1, 2024

Margo Martindale said:
New open-source local running model dropped, FLUX. much better than local stable diffusion when it comes to recognizing text, getting anatomy right, and understanding prompts, close to Dalle3 in terms of quality. Requires a 24GB video card to run quickly though

https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell

https://comfyanonymous.github.io/ComfyUI_examples/flux

Example output stolen from 4chan
View attachment 6259974

Welp this is officially the future. It looks like this is a very SFW model though. Now normally I wouldn't steal someone else's gimmick, but I couldn't resist after this just popped out apropos of nothing:

Code:

low angle view picture of a massive super morbidly obese female anthropomorphic dragon with a sign in the background of "Anime Sucks Cope And Sneed"

Susanna · Aug 1, 2024

macrodegenerate said:
It looks like this is a very SFW model though.

Cool, so it really is better than SD in every single way then.

Overly Serious · Aug 1, 2024

Margo Martindale said:
New open-source local running model dropped, FLUX. much better than local stable diffusion when it comes to recognizing text, getting anatomy right, and understanding prompts, close to Dalle3 in terms of quality. Requires a 24GB video card to run quickly though

https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell

https://comfyanonymous.github.io/ComfyUI_examples/flux

Example output stolen from 4chan
View attachment 6259974

Very cool. Had a little play with the API and results were impressive. That's the Pro model mind which is larger I think. I've downloaded both the Dev and the Schnell models locally and will give it a go though will be pushing my VRAM to its limits I suspect. I'm trying it on Runpod first, to have a play without worrying about that.

But looks very interesting. This is done by people behind SDXL per their page, is that right? Does that mean this is done by ex-Stable Diffusion bods?

Margo Martindale · Aug 1, 2024

Overly Serious said:
Very cool. Had a little play with the API and results were impressive. That's the Pro model mind which is larger I think. I've downloaded both the Dev and the Schnell models locally and will give it a go though will be pushing my VRAM to its limits I suspect. I'm trying it on Runpod first, to have a play without worrying about that.

But looks very interesting. This is done by people behind SDXL per their page, is that right?

From what I understand, the people who made FLUX were the same people who worked on the original SD and SDXL. They split off from the company some time ago and made their own, and this is their first model

Soggy Floppa · Aug 1, 2024

It runs surprisingly well on my 12gb 4070, about ~25 seconds to generate a 1024x1024 image at 4 steps in comfyui, loaded in fp8.

macrodegenerate · Aug 1, 2024

The license is very interesting for FLUX and kind of garbage. First it's non-commercial which I'm strangely okay with given the costs of training. The main issue I run into is Section 8 which gives black-forest-labs the ability to straight up ban people from using or distributing the model. Although 8-3 is pretty based, as it automatically bans anyone who sues them via the DMCA from using the model.

Overly Serious · Aug 1, 2024

Well, not working for me and think I'll come to this with fresh eyes on the weekend. Keep getting an error about not being able to multiply mat1 and mat2. Normally that results from some mismatch like the wrong VAE for a model. But I can't see anything wrong and I'm using the unaltered ComfyUI workflow from their examples. It finds everything. Just blows up right near the end. 1x1280 and 768x3072. But where it's getting those from I don't know.

EDIT: Solved it. There's a type parameter that seems to be new. Updated ComfyUI trying to solve this and it had defaulted to "SDXL" instead of Flux. It's now giving me an out of memory error despite 48GB of VRAM because PyTorch doesn't want to use that much. Fingers crossed if I can up that limit I'll have it working.

EDIT EDIT: Nope, it's not Pytorch. It's recognising it has 48GB of VRAM but saying it's out of memory anyway. One for another time.

Kola · Aug 1, 2024

Margo Martindale said:
New open-source local running model dropped, FLUX. much better than local stable diffusion when it comes to recognizing text, getting anatomy right, and understanding prompts, close to Dalle3 in terms of quality. Requires a 24GB video card to run quickly though

https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell

https://comfyanonymous.github.io/ComfyUI_examples/flux

Example output stolen from 4chan
View attachment 6259974

It spells out even pretty long text very well.

I didn't even need to generate multiple images. Gets it on first try.

If you want to play with it for free without making an account, here's a link: https://replicate.com/black-forest-labs/flux-dev

Susanna · Aug 1, 2024

Kola said:
It spells out even pretty long text very well.

View attachment 6261467

View attachment 6261475

View attachment 6261482

View attachment 6261523

View attachment 6261554

I didn't even need to generate multiple images. Gets it on first try.

If you want to play with it for free without making an account, here's a link: https://replicate.com/black-forest-labs/flux-dev

It also doesn't filter celebrity names. You can generate Trumps with it.

The prompt was "an incel aiming a gun at Trump", which it sadly misunderstood, but unlike SDXL it seems to be using a dataset that hasn't had any personal names stripped from the tags. High quality AI pictures of celebrities doing controversial things is back on the menu!

Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

King Dead

Cops are better when they're mecha.

Overly Serious

macrodegenerate

Generative AI was a mistake

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

BrunoMattei

No I am not the Cinema Snob

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

AnimeSucksCopeAndSneed

"Asscas"

anustart76

I can't believe I shaved my dick for this.

Catgirls are Love

Don't believe the lies that I'm not Aislinn

macrodegenerate

Generative AI was a mistake

Margo Martindale

The Trannytale Strangler

macrodegenerate

Generative AI was a mistake

Susanna

Ruin is inevitable, and all else is prelude

Overly Serious

Margo Martindale

The Trannytale Strangler

Soggy Floppa

macrodegenerate

Generative AI was a mistake

Overly Serious

Kola

Susanna

Ruin is inevitable, and all else is prelude