Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
This silly program keeps removing the space car in my retrofuturism poster.

This stuff is infuriating, I can't ever get it to do what I really want.
Why does it remove the hovercar in the image to image I even made it a Jewish one with triple parenthesis?

I give up, back to using paint.
I assume you're using img2img; try increasing CFG scale, maybe lower denoising strength?
 
I assume you're using img2img; try increasing CFG scale, maybe lower denoising strength?
CFG does fuck all right now denoising is already painfully low since I still don't have a cool hovercar.
I am tired of it.

I might revisit it with the new control net thing in the future. If I can lock in the woman and to stop stable diffusion from giving her extra limbs I might be able to up denoising strength an achieve a cool hovercar.
 
This silly program keeps removing the space car in my retrofuturism poster.

This stuff is infuriating, I can't ever get it to do what I really want.
Why does it remove the hovercar in the image to image I even made it a Jewish one with triple parenthesis?

I give up, back to using paint.
Triple parenthesis is just (hovercar:1.3), you can always go (hovercar:1.5) or even :2 for a much stronger effect. You could also just lower the denoising to get less random result and one closer to what you had. You could also use inpainting and only mark what you want to change (so it leaves the car alone).

As for an inpainting guide, here are two:


Ps the first thing you should do when using stable diffusion is using the tool to create graphs of images so you can learn what prompts and settings do. Like comparing models, cfg settings, denoising value etc.
 
Last edited:
CFG does fuck all right now denoising is already painfully low since I still don't have a cool hovercar.
I am tired of it.

I might revisit it with the new control net thing in the future. If I can lock in the woman and to stop stable diffusion from giving her extra limbs I might be able to up denoising strength an achieve a cool hovercar.
Definitely try inpainting. Think of it like a layer mask in image editing; you specify what you want to keep or remove, then Stable Diffusion replaces it.

Given an input of this:
00249-3327717591-before-highres-fix.jpg
I give it the same prompt as I used for the original image, then tweak the parameters for what I want. In this case, just quickly replacing the sky and replacing my (clouds), (day sky:1.5) prompts and such for (night), (city:1.5).
snip.jpg
 
Last edited:
I get they can't all be winners but I don't know what's supposed to be impressive about a fast generator if the quality of it's work is what dalle mini was doing and what was generally possible 5 or more years ago.

I've generally been unimpressed by Nvidias ai work on the software side though. I don't even see why they feel the need to do this. Just sit back and let people use what you're good at, your hardware. They are literally doing it for free Nvidia.

Edit : I do see that it's strong suit is supposed to be consistent image to image gens but the quality still leaves a lot to be desired. It's version of a "smooth" transition also makes me want to barf. Even if stable isn't supposedly as consistent between images it is easier on the eyes.
 
I've generally been unimpressed by Nvidias ai work on the software side though. I don't even see why they feel the need to do this. Just sit back and let people use what you're good at, your hardware. They are literally doing it for free Nvidia.
Any time you buy an Nvidia card it seems to come with a suite of absolute crap I can't imagine anyone actually uses. They're all shovelware versions of stuff where there are vastly superior options often for free.
 
I'm getting a good workflow going thanks to reading guides from others, working my way up to some very nice large-res images that I then upscale further to finish off. There are sometimes some minor errors, mostly hands and feet, but I get to a point where I'm pleased with it and move on to something else. I have an old card (1070) so it's not the fastest, but I can let it run a few batches while I'm doing other stuff and come back to sort out the results.

In a nutshell, I generate a bunch of txt2img results at a low res and look for one that's close to what I have in mind, tweaking prompts as necessary. Then I'll send it to img2img and generate a bunch more images upscaled to a medium res. I'll inpaint as needed, especially major adjustments, then upscale and generate again. By this point, it's getting close to what I'm looking for, so I'll make minor tweaks with inpainting, even sending it to GIMP to make edits before throwing it back in. Then I'll do one more upscale generation to even out any weirdness, make sure everything's good, and do a 4x upscale in extras to finish it off.

Now, question: I'm thinking about making AI portraits of my D&D party, but with all those models floating around, there's a lot to dig through. Anyone have any favorite fantasy models/prompts they use? Ideally, something that can generate both martials and casters well would be ideal since we have a healthy mix of those, as well as more exotic races like dragonborn (my wizard). I don't feel like going to the trouble of making multiple Discord accounts to abuse Midjourney, and I'd much rather generate images myself anyway.
 
I've been messing with this since late september. This tech is moving fast. I thought I'd contribute some things that might help others.

First off, I personally do not want to mess with any of the online services for several reasons. If you have a good GPU, you can use these projects below to set up a local instance and generate images locally.

This one is the easiest to install, has some good features and ease of use functionality to help you.

This one is only slightly more involved to install but still easy. Has some more advanced features, too.

On my PC with two GTX Titan XP cards, I can generate an image, upscale it and apply face correction in about 20 seconds.

You can find other models to try out here: https://rentry.org/sdmodels

Be warned, this can be very addictive. It's got a lot of randomness to it, so it's like pulling the lever on a slot machine hoping to hit a jackpot over and over. Can easily pass hours messing with it.
Maybe I need to install a new model or something first, but the cmdr2 pretty much generates whatever it wants to generate regardless of the input image, text entered or tags used. I literally used an image of my car (no tags, only text) and it output a gothic cave with candles and a cauldron. I'm downloading SD 2.0, 2.5 and a couple other models, maybe that will help.
 
  • Thunk-Provoking
Reactions: A Hot Potato
Maybe I need to install a new model or something first, but the cmdr2 pretty much generates whatever it wants to generate regardless of the input image, text entered or tags used. I literally used an image of my car (no tags, only text) and it output a gothic cave with candles and a cauldron. I'm downloading SD 2.0, 2.5 and a couple other models, maybe that will help.
When I posted originally, I found you need to give it a text prompt or you will get something random. Describe your input image and what you want it to look like. I've given it a selfie and this prompt "This image but black" and I got out a version of me as a black man.

I've done this with 1.4 and 1.5 SD models.
I did this a few months ago, so maybe something changed. There's different "samplers" that interpret the text input and such. I'm not an expert, but I don't think the model is the problem. I haven't messed with it a lot lately. Please report back with your findings.

I'll update my software and try it again and see what I get. It gets updated all the time.
 
Last edited:
When I posted originally, I found you need to give it a text prompt or you will get something random. Describe your input image and what you want it to look like. I've given it a selfie and this prompt "This image but black" and I got out a version of me as a black man.

I've done this with 1.4 and 1.5 SD models.
I did this a few months ago, so maybe something changed. There's different "samplers" that interpret the text input and such. I'm not an expert, but I don't think the model is the problem. I haven't messed with it a lot lately. Please report back with your findings.

I'll update my software and try it again and see what I get. It gets updated all the time.
Since I posted I've been learning quite a bit and discovered a ton of different models on various sites like: https://civitai.com and https://huggingface.co/ and produced some amazing results. So the software you posted turns out to be a good introduction to ai image generation. Once I understood the difference between a hypernetwork model, vae and samplers I was able to match them up the right way to produce some amazing images.

Of course understanding how many steps each sampler should be set to and using both positive and negative prompts with as much detail as possible certainly helped me in the end. My initial mistakes were pretty much expecting to type a basic prompt like you would in midjourney not realizing they have another ai on top of several models to help determine which samplers and other settings to use. My mistake was like driving an automatic then switching to stick shift expecting the same results while wondering why my transmission was grinding.

So now I'm driving stick shift a bit better but I see there is more to learn. This is me several hours of video and website tutorials later.
 
How do you guys rank in terms of Tom's Hardware's SD GPU benchmarks? I followed their standards to the best of my ability (except for doing only one run and not bothering to average 10 of them) and got 7.92s/it. I'm a cputard and unless I read something wrong I guess I really underestimated my gear.
Graph 1 Graph 2
postapocalyptic steampunk city, exploration, cinematic, realistic, hyper detailed, photorealistic maximum detail, volumetric light, (((focus))), wide-angle, (((brightly lit))),
(((vegetation))), lightning, vines, destruction, devastation, wartorn, ruins
Negative prompt: (((blurry))), ((foggy)), (((dark))), ((monochrome)), sun, (((depth of field)))
Steps: 100, Sampler: Euler a, CFG scale: 7, Seed: 3563678826, Size: 512x512, Model hash: fe4efff1e1, Model: sd-v1-4
Saved: 1677788596035-3563678826.png
 
How do you guys rank in terms of Tom's Hardware's SD GPU benchmarks? I followed their standards to the best of my ability (except for doing only one run and not bothering to average 10 of them) and got 7.92s/it. I'm a cputard and unless I read something wrong I guess I really underestimated my gear.
About 2.2 it/s with a GTX 1080 using xformers. This is an area where older GPUs will absolutely crush powerful, expensive CPUs as the ability to do shitloads of calculations in parallel reigns supreme. Even if you're using a $3,500+ server processor a GPU from 2016 is more suited to this task. From the CPU setup guide:
--SPEED PER RESULT--
(Intel(R) Core(TM) i5-8279U) 7.4 s/it 3.59 min
(AMD Ryzen Threadripper 1900X) 5.34 s/it 2.58 min
(Intel(R) Xeon(R) Gold 6154 CPU) 1 s/it 33 s
An older, far cheaper GPU is better suited to this task.
 
  • Like
Reactions: Puff
Back