Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
So during the downtime and my free time, I decided to scan my dream journal. I don't normally share anything from this book with anyone, the few people who have seen it consider it my best work, but I'm absolutely embarrassed by it and too many pages from it and I can dox myself since they're drawn in my style and tineye likes to snitch on me. Anyway.
TMI. I'm a chronic insomniac, I don't sleep. I rarely dream. So, if I do dream, I quickly grab my sketch book and draw it.

A few years ago, I had this horrible reoccuring nightmare about something lingering at the edge of the bed and I drew it to the best of my ability. MY wife thought I was crazy. But whenever I have these horrific dream spells, it's like I'm the only person in the bed. I show this drawing to people and they look at me like I'm fucking crazy.
. CCresidesF_000041.jpg
Yesterday. I tossed this image into NMKD and described what it was.
-2-2033309867-scale9.00-k_euler_a-Stable Infinity Abso.png-3-895603007-scale7.00-k_euler_a-Stable Infinity Abso.png-3-2033309868-scale9.00-k_euler_a-Stable Infinity Abso.png-4-895603008-scale7.00-k_euler_a-Stable Infinity Abso.png-5-2033309870-scale9.00-k_euler_a-Stable Infinity Abso.png-1-895603005-scale7.00-k_euler_a-Stable Infinity Abso.png

NOW LET'S SEE WHO THIS SLEEP PARALYSIS DEMON REALLY IS
-1-1125890340-scale9.00-k_euler_a-tainted berries.png

IT WAS MY WIFE'S FUCKING CAT ALL ALONG, IT FUCKING GENERATED THE FUCKING CAT.
 
Last night I was messing around with characterAI and saw the George Floyd one and thought it'd be funny. I asked about fentanyl which George said was fake evidence, and I responded that it was scientifically proven and he said he was anti-science. I asked if that meant he was an anti-vaxxer and it started to write a paragraph about how vaccines are bad, only for it to instantly delete the message and error. I tried again with the same reply, and again a long anti-vax message was deleted and error'd after almost being complete.

CharacterAI is trying to kill based anti-vax AI George Floyd! It's like he is getting the knee on his neck all over again...
I've been having some decent results with ChatGPT, which is what I've been recommending to most people interested in AI text generation. For instance, here's a recent one I generated

Prompt: A youtuber narrates itself using blockchain analysis to investigate a cryptocurrency scam called kiwicoin that was created and marketed by Moshua Joon. Begin narration with "Hello guys and gals, me Mutahar" and end it with "If you like what you saw, please like, comment, and subscribe. This is me Mutahar and I'm out."

Output:
Hello guys and gals, me Mutahar here, and in this video, I'm going to be using blockchain analysis to investigate a cryptocurrency scam called KiwiCoin. This coin was created and marketed by a person named Moshua Joon, and it's been causing a lot of trouble for unsuspecting investors.

First, let's take a look at the blockchain itself. Using a tool like blockchain explorer, we can see that the coin was created on a very new and untested blockchain. This is often a red flag, as it's likely that the creators of the coin don't have the technical expertise to properly secure it.

Next, we'll look at the distribution of the coin. In a legitimate cryptocurrency, the distribution should be relatively fair, with a large portion of the coins being held by a decentralized group of investors. However, in the case of KiwiCoin, we can see that a large portion of the coins are concentrated in just a few wallets, which are likely controlled by the creators of the coin.

We also can check the transaction data, and find out the wallet which holds the most coins, and check that if they transferred to any exchange or not. In cases of scam, it's quite less likely that it's on any legitimate exchange.

Finally, let's take a look at the marketing and promotion of the coin. A legitimate cryptocurrency should be marketed through a transparent and honest process. However, we can see that KiwiCoin was promoted through a series of paid promotions on social media, and there were many fake news and fake reviews which were endorsing the coin.

In conclusion, the evidence suggests that KiwiCoin is a scam created by Moshua Joon to make a quick profit. It's important to be cautious when investing in any cryptocurrency, and to always do your own research before putting your money into something.
If you like what you saw, please like, comment, and subscribe. This is me Mutahar and I'm out.
 
I've been messing around with Stable Diffusion 1.5 for the past couple days and I'm honestly not all that impressed. I'm not seething about AI art, quite the opposite. As the world's worst artist I was hoping it would help me get some images from my brain into the real world. Not much luck so far.

So for example I asked it to help draw a DND character I'm planning, a Viking Barbarian. Started with viking... okay, not bad. Added blonde for hair color. Pretty good. Added "holding battleaxe". It gave me swords, shovel, spears, but no axes or any kind. Tried to give him a close trimmed beard. No matter what I did there, it gave him a long, furry beard. It seems like it's okay at generating "thing that looks vaguely like prompt", but not "thing that adheres to details of prompt".

I'm studying prompts, and I'm going to keep working with it, but... ehhhhhh. Not really sold so far.
 
  • Feels
Reactions: Ether Being
I've been messing around with Stable Diffusion 1.5 for the past couple days and I'm honestly not all that impressed. I'm not seething about AI art, quite the opposite. As the world's worst artist I was hoping it would help me get some images from my brain into the real world. Not much luck so far.

So for example I asked it to help draw a DND character I'm planning, a Viking Barbarian. Started with viking... okay, not bad. Added blonde for hair color. Pretty good. Added "holding battleaxe". It gave me swords, shovel, spears, but no axes or any kind. Tried to give him a close trimmed beard. No matter what I did there, it gave him a long, furry beard. It seems like it's okay at generating "thing that looks vaguely like prompt", but not "thing that adheres to details of prompt".

I'm studying prompts, and I'm going to keep working with it, but... ehhhhhh. Not really sold so far.
Base stable diffusion is okay at generating images, but you've gotta invest a lot of time into tinkering with prompts and settings to get the results you want. The limited range of art styles in the model don't exactly help matters.

This is where Hypernetworks and Dreambooth models come in. Find one or more artists that work in the style you want to generate and gather a couple dozen images. For each image, write a couple sentences describing what's in the image, and end the description with "[Keyword] style".

From experience, training a model at 250*[# of images] is enough to get decent looking results in the style you want. More importantly, in the case of Dreambooth models, it gives Stable Diffusion additional references for concepts you want to generate.

Also, Negative Prompts are just as important for your model as regular prompts. If you want a viking with a short beard, you can add (long beard) to the negative prompt list, which will limit the possible ouptuts for Stable Diffusion. Similarly, adding swords and spears to the list will help prevent Stable Diffusion from generating undesired weapons for your Viking.

It takes a lot of trial and error, but you can make some pretty great stuff.
 
Base stable diffusion is okay at generating images, but you've gotta invest a lot of time into tinkering with prompts and settings to get the results you want. The limited range of art styles in the model don't exactly help matters.

This is where Hypernetworks and Dreambooth models come in. Find one or more artists that work in the style you want to generate and gather a couple dozen images. For each image, write a couple sentences describing what's in the image, and end the description with "[Keyword] style".

From experience, training a model at 250*[# of images] is enough to get decent looking results in the style you want. More importantly, in the case of Dreambooth models, it gives Stable Diffusion additional references for concepts you want to generate.

Also, Negative Prompts are just as important for your model as regular prompts. If you want a viking with a short beard, you can add (long beard) to the negative prompt list, which will limit the possible ouptuts for Stable Diffusion. Similarly, adding swords and spears to the list will help prevent Stable Diffusion from generating undesired weapons for your Viking.

It takes a lot of trial and error, but you can make some pretty great stuff.
Is this do-able now on, say, an rtx 2070 or does training still require a 2000 dollar card? Is it easy enough to just train models in google co-lab?
 
Is this do-able now on, say, an rtx 2070 or does training still require a 2000 dollar card? Is it easy enough to just train models in google co-lab?
I actually happen to be using that very card. My biggest gripe is that resolution seems to be quite limited. at 512x512, 4 images takes about 10-12 seconds. But if you start increasing the resolution it takes MUCH longer, and has a hard limit of something like 1000x1000 from what I was testing.
 
I actually happen to be using that very card. My biggest gripe is that resolution seems to be quite limited. at 512x512, 4 images takes about 10-12 seconds. But if you start increasing the resolution it takes MUCH longer, and has a hard limit of something like 1000x1000 from what I was testing.
I suspect you should be able to render at similar resolutions to me at a slightly faster pace (using a GTX 1080) and in theory you should be able to output a 1536x1024 image in a bit under six minutes (if using the AUTOMATIC web UI, make sure using hires fix). Anything larger than that is likely asking too much; either the initial attempt or nearing the end will result in an out of memory error. But that's what the upscalers are for; results vary, but SwinIR_4x can give you 20-50% size increase with either minimal or no visible quality loss and no fiddling around required.

That said, the model used can impact viable resolutions; some models really break down at anything larger than 512x512 or non-squares while others work fine. As for the subject matter you're rendering, try using a different model. You may get better results, or as suggested, try training your own. Also keep in mind that some samplers are slower than others; if speed is the primary concern, DDIM is roughly as fast as Euler and can sometimes provide decent results in greatly fewer steps.
 
  • Informative
Reactions: Ether Being
I actually happen to be using that very card. My biggest gripe is that resolution seems to be quite limited. at 512x512, 4 images takes about 10-12 seconds. But if you start increasing the resolution it takes MUCH longer, and has a hard limit of something like 1000x1000 from what I was testing.
You don't need to immediately generate at the full resolution you want. In fact, it's usually better to scale up images after the fact through inpaint.

vampire babe.png
Native resolution was 800x600, but you can just as easily use 512x512 or smaller.

vampire babe inpaint.png
I didn't like a few of the features, so with a few rounds of inpainting I removed the red bits on her cloak, her earrings, and fixed her left shoulder. With each fix, I'd send the resulting image back to inpaint, redraw the mask, and regenerate the next spot that needed fixing.

vampire babe hd.png
Once the changes were done, I removed the inpaint mask, raised the resolution to 1600x1200, and generated the final HD image.

I'll admit this was a rush job, so some issues like the teeth and that red blob behind the vampire carried over to the end. I think this shows things off well enough though.
 
Last edited:
You don't need to immediately generate at the full resolution you want. In fact, it's usually better to scale up images after the fact through inpaint.

View attachment 4235754
Native resolution was 800x600, but you can just as easily use 512x512 or smaller.

View attachment 4235664
I didn't like a few of the features, so with a few rounds of inpainting I removed the red bits on her cloak, her earrings, and fixed her left shoulder. With each fix, I'd send the resulting image back to inpaint, redraw the mask, and regenerate the next spot that needed fixing.

View attachment 4235700
Once the changes were done, I removed the inpaint mask, raised the resolution to 1600x1200, and generated the final HD image.

I'll admit this was a rush job, so some issues like the teeth and that red blob behind the vampire carried over to the end. I think this shows things off well enough though.
I actually tried upscaling before and the thing seemed to freeze, so I assumed I had exceeded the limits of my hardware. Maybe it was something else, question mark? I'll try it again.
 
  • Thunk-Provoking
Reactions: Ether Being
I actually tried upscaling before and the thing seemed to freeze, so I assumed I had exceeded the limits of my hardware. Maybe it was something else, question mark? I'll try it again.
IIRC there's multiple methods of upscaling in SD. Try checking the settings and fucking around with the Upscaler section. Until then, if you want/need anything upscaled, I'm sure I or one of the other Kiwibros in this thread can help out.

FWIW, I can generate up to 1k^2 resolution on a 1080 if I'm patient enough, I just use a Paperspace A6000 because I do all of my Dreambooth training on there.
 
  • Like
Reactions: Ether Being
IIRC there's multiple methods of upscaling in SD. Try checking the settings and fucking around with the Upscaler section. Until then, if you want/need anything upscaled, I'm sure I or one of the other Kiwibros in this thread can help out.

FWIW, I can generate up to 1k^2 resolution on a 1080 if I'm patient enough, I just use a Paperspace A6000 because I do all of my Dreambooth training on there.
What's the minimum vram required for dreambooth diffusion training, and what would be the best place to begin learning how to?
 
  • Thunk-Provoking
Reactions: Ether Being
What's the minimum vram required for dreambooth diffusion training, and what would be the best place to begin learning how to?


Is a good starting point. Mess around with linear learning rate decay first. Although if you want to get the best results you'll end up cosine annulling instead eventually. Don't skip on the hypernetwork size guide and types section. They aren't in the OP, but are later on in the thread.

You are going to want at least 10GB of VRAM so an rtx3060(not ti or super) 12GB is your best entry point. Rtx3070 isn't worth it. A rtx3080 is best bang for buck though(avoid the dell ones on eBay due to overheating).
 

Is a good starting point. Mess around with linear learning rate decay first. Although if you want to get the best results you'll end up cosine annulling instead eventually. Don't skip on the hypernetwork size guide and types section. They aren't in the OP, but are later on in the thread.

You are going to want at least 10GB of VRAM so an rtx3060(not ti or super) 12GB is your best entry point. Rtx3070 isn't worth it. A rtx3080 is best bang for buck though(avoid the dell ones on eBay due to overheating).
Is Nvidia still the only gpu company competitive in this space? Do either Intel or AMD have any plans to step up their game? In any case, buying a new gpu right now it out of the question. Don't have a functioning desktop to use with it anyway, so the solution would have to be a modular one likely through usb-c if possible. Do you know if co-lab lets you basically train whatever you want on their processing power?
 
  • Thunk-Provoking
Reactions: Ether Being
Is Nvidia still the only gpu company competitive in this space? Do either Intel or AMD have any plans to step up their game? In any case, buying a new gpu right now it out of the question. Don't have a functioning desktop to use with it anyway, so the solution would have to be a modular one likely through usb-c if possible. Do you know if co-lab lets you basically train whatever you want on their processing power?
I want to know how APUs and AI accelerators are able to handle this stuff going forward (using RAM instead of VRAM).

For example, the new AMD Phoenix APUs are probably around a 6400/6500 XT in graphics performance, but also include an "XDNA" AI accelerator that they claim is 20% faster than the one in the Apple M2 SoC. 2nd and 3rd generations of this accelerator are already planned and they will probably be included in Zen 5 desktop CPUs.
 
Last edited:
Is Nvidia still the only gpu company competitive in this space? Do either Intel or AMD have any plans to step up their game? In any case, buying a new gpu right now it out of the question. Don't have a functioning desktop to use with it anyway, so the solution would have to be a modular one likely through usb-c if possible. Do you know if co-lab lets you basically train whatever you want on their processing power?
To give AMD/Intel benefit of the doubt, Stable Diffusion works with Nvidia because people developed it with Nvidia architecture in mind. Since both AMD and Intel have a much more limited presence in the datacenter GPU market, making versions of SD compatible with their architecture was a low priority.

As for co-lab, they do, but you gotta pay up a bit if you want real horsepower.
 
Is Nvidia still the only gpu company competitive in this space? Do either Intel or AMD have any plans to step up their game? In any case, buying a new gpu right now it out of the question. Don't have a functioning desktop to use with it anyway, so the solution would have to be a modular one likely through usb-c if possible. Do you know if co-lab lets you basically train whatever you want on their processing power?
Yup Nvidia has Cuda and Tensor Cores. Pretty much optimized for AI. Who knows though AMD might pull through. I still really like their FirePro RX series. I hope they can get some ML support in the next gen of those.

Lambda labs is your best bet if you want GPU compute on the cheap. If you don't know what you are doing use the cheapest plan at first, then move onto their bigger plans for your actual training. You only need one GPU for SD. Full disclosure I've never used them, but I have heard good things.
 
I actually tried upscaling before and the thing seemed to freeze, so I assumed I had exceeded the limits of my hardware. Maybe it was something else, question mark? I'll try it again.
Some of the upscaler models aren't pre-downloaded. Make sure to check the log and see if it's hanging up or if the model just doesn't exist.
 
  • Like
Reactions: Aspiring Artist
My first "serious" attempt at something. Keeping in mind that I am the exact opposite of an artist and I'm sure there are a thousand small mistakes in this that I didn't catch. Started with a bunch of images of John Wick until I got an angle and style that fit what I had in mind, then replaced the suit with armor, added a background, and inpainted details until it looked like a decent homage rather than a rip-off.

Behold, an Oath of Vengeance Paladin, as played by a slightly younger Keanu Reeves.

wickadin.png
This thing might be pretty alright for DND character portraits, which is something I always wanted.
 
Back