Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🔧 At about Midnight EST I am going to completely fuck up the site trying to fix something.
Misty looks fine but pikachu looks like he's beginning to melt.
I will admit I picked the cutest girl rather than the most electric rat.
It'd be easy to fix but y'know, pay me

I don't have any tips since I just threw it into the last prompt I'd been messing with but it'd be something heavily-dependent on the checkpoint. I'd guess that in this case being more specific ("pikachu with some attributes doing some thing" rather than "pikachu") would give both the model and you more threads to pull. I'd try some examples but I'm sleepy
 
Spent the past couple days of downtime experimenting with SD and NAI. A few observations

1.) When building the dataset for your hypernetwork, a couple sentences of prose will work infinitely better than most organized taglists.

2.) Training goes a lot smoother when the images in your dataset have backgrounds, and the results are more accurate to the source material. This is true even in cases where the background is just simple splashes of color breaking up an otherwise uniform surface.

3.) For art styles, SD 1.5 works best for semi-realistic styles in my experiments, while NAI works better for styles with more exaggerated proporitions (Anime, cartoon, tumblr askblog)

mara 1.png
The result of 2000 steps of training at a learning rate of 5e-6. Dataset was a collection of 19 images with the descriptions autogenerated by BLIP. Prompt was something along the lines of "Woodland Dryad, acrylic and watercolor"

Although imperfect, I'm not disappointed with the output. The results could likely be refined by running the training for longer and at varying learning rates, however for sake of argument lets change up the dataset inputs.

mara.png
The result of 2000 steps of training at a learning rate of 5e-5. Dataset was the same as the above, but with 4-5 sentences written for each image that clearly describes what is going on.

As you can see, the shading and facial structure on our Dryad is much more defined. I'll note that SD seems to have incorporated a number of horn-shaped hair accessories, which I believe may be the result of including this illustration in the dataset.
00018-0-seredibdjinn.png

Overall though, the past few experiments are giving me a good idea of where I can keep pushing in future experiments. If anyone else wants to experiment with the dataset, I've included it for their convenience.

I tried to generate pokemon starting with pikachu but gave up pretty quick. Potentially good halloween avatar material.
View attachment 3766784View attachment 3766788View attachment 3766832
I'm so glad The Cheat is not dead.
Screen-Shot-2014-07-09-at-2.44.01-PM.png
 

Attachments

Last edited:
Reminder that checkpoint files can be malicious so if you run a model/hypernet posted by anyone with a pink regdate you're a fucking moron.

There's an unpickler project for validating/cleaning them but it hadn't been vetted when I looked (a week or more ago?) so I'm not sure of the current status.
 
I woke up, here's some tests (same seed):

View attachment 3774592 View attachment 3774616

View attachment 3774580 look at this chubby fuck View attachment 3774576

View attachment 3774716 View attachment 3774804 View attachment 3774832
View attachment 3774872 View attachment 3774940 View attachment 3774952

And then I got bored. It mostly seems to struggle with the tail (but that'd take about five seconds to fix with inpaint).
the last one is the best one on there im yoinkin it
 
the last one is the best one on there im yoinkin it
Word. It is ripe for Virgin vs Chad memes.
View attachment 3775136
Fiddled with txt2img a bit today with a chubby pikachu eating a strawberry on a table and the blue gleam in its eyes makes it look soulless. Looks like the AI doesn't do well with verbs like eating or biting, like my previous attempt.
This may be due to a lack of references to characters/people actually biting down on something. Since photos of someone stuffing their face are usually less aesthetically pleasing than photos of people merely holding the food, there were likely fewer examples in the original dataset.

Thought: Run some training with images where characters are putting the food into their mouth instead of just holding it in front of their faces. Alternatively, see how txt2img interprets a chubby pikachu biting into a strawberry on a table.

As for the eyes, that looks consistent to how glossy black objects reflect light, but in the worst case you can just pull a Dream and photoshop the light reflection.
 
Last edited:
it also does not handle putting clothing on animals, every attempt to put clothing on an animal I have made has resulted in either the A.I turning them into a furry or some kind of deformed human - animal hybrid abomination, like a wolf wearing a santa hat but it has human hands >.>
 
it also does not handle putting clothing on animals, every attempt to put clothing on an animal I have made has resulted in either the A.I turning them into a furry or some kind of deformed human - animal hybrid abomination, like a wolf wearing a santa hat but it has human hands >.>
Try to beat it to death with negative prompts. I'll give it a go.
Edit: Bro, what are you doing to make that happen. My generic prompts plus Wolf wearing a santa hat got a 4/6 hit for non-abominations (maybe 3, one is borderline). I'm sure you can get that number up with a bit of work.
grid-0020.png
This needs exploring.
 
Last edited:
Negative prompts really are the way to go. Reduced "deformed nightmares from the sixth circle of hell" to around a tenth. With the right prompts you even get the AI to generate pictures which look nearly as realistic as a photography. Creepy stuff.
I find it best to nip something in the bud the moment you notice it.

Ugly looking face? "Bad face"

Autistic looking pose? "Bad posture"

Saggy titties? "Bad breasts"

You have to get even more specfic sometimes, but strike at that AI the moment it does something you don't like.

Not that this ever fixes hands being bad, sadly.
 
it also does not handle putting clothing on animals, every attempt to put clothing on an animal I have made has resulted in either the A.I turning them into a furry or some kind of deformed human - animal hybrid abomination, like a wolf wearing a santa hat but it has human hands >.>
"Pig wearing a crown and royal gown, oil on canvas", on vanilla SD 1.5. Negative prompts were "Furry, anthro, anime"
well shit.png

Like Puff said, the more negative prompts you use the better mileage you get. Stable Diffusion is really good at what it does, but you need to reduce ambiguity through your parameters otherwise it'll take the path of least resistance.
 
Last edited:
Negative prompts really are the way to go. Reduced "deformed nightmares from the sixth circle of hell" to around a tenth. With the right prompts you even get the AI to generate pictures which look nearly as realistic as a photography. Creepy stuff.
I still keep getting freaky "realistic people with anime faces".... I can't fix it... I'm downloading other checkpoints to try, but my internet is dogshit. Also Null broke uploading images. It hangs up on upload forcing me to reload and THEN insert the thumbnail. Probably something to do with a jihad on non-thumbnailed images breaking the editor

Negative: hands, furry, santa suit, jacket
grid-0021.png
That plus "photo" in the positive prompts
grid-0022.png
I find it best to nip something in the bud the moment you notice it.

Ugly looking face? "Bad face"

Autistic looking pose? "Bad posture"

Saggy titties? "Bad breasts"

You have to get even more specfic sometimes, but strike at that AI the moment it does something you don't like.

Not that this ever fixes hands being bad, sadly.
Get a generic negative prompt-set as you go. It speeds up things.... Start with the one from the retard guide in the OP and work from there.
Also Inpaint fixes hands eventually.
Edit: Final wolf grid... added "wide shot". Note that these are all very low it. images
grid-0023.jpg
 
Last edited:
Not that this ever fixes hands being bad, sadly.
It can help a bit: "bad hands, missing fingers, extra digit, fewer digits, poorly drawn hands," etc.

Also StabilityAI released this VAE that seems to help a lot: https://huggingface.co/stabilityai/sd-vae-ft-mse-original or https://transfer.pcloud.com/download.html?code=5ZBEpuVZDkqH4LW8BsuZgbYIZ2CyEs8OBcbFMiueYNodTRVKtVVpk

Just rename it from .ckpt to .vae.pt and drop it in your models folder. Give it the same name as the model you want to use it with, or add --vae-path "models\Stable-diffusion\vae-ft-mse-840000-ema-pruned.vae.pt" (or whatever/wherever you have it) to your webui-user batch args and it'll apply to all models, which I've been doing with no problems.
It feels minor but I'm sure I've noticed much better limbs/hands even with the basic Euler samplers; like they come out okay 40% of the time rather than 10% now.

Edit: also if you use --no-half you might need to add --no-half-vae too (but I forgot to do this on my old lappy and it worked fine so idk, ymmv).
 
Last edited:
It can help a bit: "bad hands, missing fingers, extra digit, fewer digits, poorly drawn hands," etc.

Also StabilityAI released this VAE that seems to help a lot: https://huggingface.co/stabilityai/sd-vae-ft-mse-original or https://transfer.pcloud.com/download.html?code=5ZBEpuVZDkqH4LW8BsuZgbYIZ2CyEs8OBcbFMiueYNodTRVKtVVpk

Just rename it from .ckpt to .vae.pt and drop it in your models folder. Give it the same name as the model you want to use it with, or add --vae-path "models\Stable-diffusion\vae-ft-mse-840000-ema-pruned.vae.pt" (or whatever/wherever you have it) to your webui-user batch args and it'll apply to all models, which I've been doing with no problems.
It feels minor but I'm sure I've noticed much better limbs/hands even with the basic Euler samplers; like they come out okay 40% of the time rather than 10% now.
It's what I've been using and my prompts have been way more consistently good since.
 
This is 100% being done already for instrumental music. IIRC some lawyer was trying to use an algorithm to copy claim every melody conceivable as a PR stunt to end copyright sharks. Some guy on youtube has trained a machine to generate Djent songs and didn't even need a neural map. Music is all patterns with pretty clear cut rules and it's all highly iterative/branching which is the stuff the AI is actually good at. If anyone knows of any open source stuff for doing this I would be interested in checking it out. Or even just good sound libraries/software, I've been thinking of getting into programmatic jam track creation so I can work on composition more and then get baked and solo overtop.
Jukebox my man. The songs with voices are hilarious. Like Prisencolinensinainciusol.

Example:
And this website has everything they generate (i.e. lots of garbage)

I love the results but then again I like gupi.

I used RTVC to do voice acting for some game samples and the recipient noted about the audio quality but didn't realize it was computer generated samples.
 
Last edited:
Back