- Joined
- Dec 17, 2019
A lot of it is just trial and error, but as you get practice, you'll start to get a feel for what to focus on and how to get the output closer to what you're looking for. Some pointers:I think I missed something but where do you download SD3 or is SD2 still preferred or is there another variant I'm not aware of that plays nice with ComfyUI?
Edit: okay, so I'm trying Easy Diffusion v.3.0.9 that uses SD 1.5. Right now I'm just fucking around with it. Any pointers to get a more precise image? Sorry for newbie questions.
- A good way to start is to find an output from the model you're using that you like, then using that image's prompt as a starting point. Some models will work better with a list of keywords, while others do well with natural language. Tweak the prompt with things you're looking for.
- The sampler you're using can also have a big difference on the output. Don't ask me to explain how these work, just know that they're different algorithms and thus will produce different results. My personal preference is DPM++ 2M Karras because I think it does vibrant colors well, but try a few and see what you think. (If you reuse the seed from your previous output, you can see the impact of individual changes to your settings more easily.)
- Similarly, the number of sampling steps can also have an impact, though you should avoid falling into the trap of "more steps = better image." A lot of samplers will stop really making noticeable changes after a particular number, and then you're just wasting time. I stick to 20 personally, but again, try a few different numbers.
- When writing a prompt, while it does consider everything you write, it seems to help to have the main focus of your image first, then fill in the details afterward.
- Weighting your keywords is also important. By default, all keywords will have the same weight, but putting parentheses around one or a group will increase their weighting and make the model focus on them more. For example, (keyword:1.2) will increase the strength of keyword by 20%, while (keyword1, keyword2:1.4) will increase that group's strength by 40%. Similarly, you can reduce the emphasis by putting a number lower than 1. Keep in mind that putting too much emphasis can make it freak out and make something weird.
- You can try changing the CFG scale number as well. This determines how closely the model adheres to your input; the lower the value, the more creative it's allowed to be. It's usually best not to put this too high because it's that creativity that really allows for interesting outputs, but bumping it up a bit can help.
- Look for a good anti-deformity textual inversion or two. These are lightweight models trained on bad images that should be placed in your negative prompt, which will make the model try to avoid outputs that resemble those images. Hands can be particularly tricky for some models, so having one specifically for hands isn't a bad idea.
- Text-to-image will only get you so far, the really great AI images are done with a lot of extra work involved. Upscale your txt2img output through img2img, and use inpaint or inpaint sketch to make changes to small areas of the image. After inpainting, upscale again with img2img to have the model blend those changes more seamlessly together and produce a more seamless output, lowering the denoising strength every time you upscale until you reach a satisfactory result.
- Above all, patience is key. It may take a lot of generations to get something that you like, but if you're willing to put in the effort, you'll get something you can be proud of.
- Txt2img, 800x640, usually using a prompt from a previous image tweaked for what I'm looking to do this time. I'll generate one output at a time and make changes to my prompt as necessary. Maybe I decided to change the background, or maybe I wanted to add an accessory I hadn't thought of, or maybe the pose wasn't really catching my eye.
- When I think I'm happy with the prompt, txt2img, 800x640, five batches of five images. This gives me 25 different outputs to look through, and I'll pick the best one to move forward with. If I don't like any, I'll do another run. (This is true for every step.)
- Send that picture to img2img, 1000x800, denoising strength 0.6, three batches of three images. As the resolution goes up, the number of pics I can generate without my card running out of memory goes down, but you also generally won't need as many because the outputs won't change as much (I have an old 1070, if your card is newer then it can handle larger batch sizes). Again, I'll pick the one I like the best and move on.
- The image will likely need some inpainting, so I'll send it over there. I mostly use inpaint sketch because it can usually handle that fine, and it gives me more control than standard inpainting. Say it drew an extra limb over the background; I'll pick a color close to the background, paint over it, and run it through with the same settings. Don't worry about fixing every little inconsistency, that's what the next step is for.
- Once I've fixed the major stuff, img2img, 1200x960, denoising strength 0.5, two batches of two images. This will even out the inpainted sections with the rest of the image.
- If there are any lingering problem spots, a little more inpainting is done here. Usually there won't be much if any, but maybe something slipped my notice before.
- A good trick to add detail to something that could use more, like a face: regular inpainting over that section (I'd do the whole head and neck area), change inpaint to "only masked," increase the only masked padding to 256 pixels. This makes it create a full-rez version of that area before shrinking it and putting it into the full image, making things look nicer.
- One more img2img, 1800x1440, denoising strength 0.4, one image. Usually it'll only take one at this point, but if something didn't quite come out right, I'll keep generating until everything looks good.
- Repeat step 7 one more time to really make those high-detailed areas pop.
- Upscale the finished result (I use 4x-UltraSharp for somewhat realistic images), and enjoy!