Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

@verymuchawful Well you were right about what was possible. I was surprised at how able to run Flux locally I was.
It's a bit slow but doable. I'm getting around 1 iteration/sec with Flux Dev, fp16, on a 4090. It's using about 22 GB of VRAM though.

Can someone who played around with Flux tell me if it can copy artists style and does it know specific people?
Could it create a drawing of George Floyd punching Elizabeth Olsen in the stomach in the style of Todd McFarlane?
It seems like we will need LoRAs for specific people and styles. It's awful at celebrities, likely intentionally. It's okay at generic styles, but not specific artist styles. Eg. putting in your prompt it returned:
ComfyUI_00099_.png

The composition and relation between objects is a huge improvement though. It's much closer to the Microsoft image generator in terms of being able to accurately depict how things in the image relate to each other. Once we get some fine-tunes and LoRAs it's going to be incredible.

Eg.
draw a cartoon of a clown riding on a unicycle, juggling a torch, a sword, and a bowling ball.
ComfyUI_00100_.png
First try and it's dead-on.
 
It's a bit slow but doable. I'm getting around 1 iteration/sec with Flux Dev, fp16, on a 4090. It's using about 22 GB of VRAM though.


It seems like we will need LoRAs for specific people and styles. It's awful at celebrities, likely intentionally. It's okay at generic styles, but not specific artist styles. Eg. putting in your prompt it returned:
View attachment 6304070

The composition and relation between objects is a huge improvement though. It's much closer to the Microsoft image generator in terms of being able to accurately depict how things in the image relate to each other. Once we get some fine-tunes and LoRAs it's going to be incredible.

Eg.

View attachment 6304060
First try and it's dead-on.
Mine gave me the same Bob Vila guy for Floyd, I wonder if something weird was in the images it trained on.

Maybe saying Family Guy style would have worked better than Seth McFarlane style.

Edit: I am now realizing that Todd is a very different McFarlane, so generic comicbook style is acceptable, if not perfect.
 
Last edited:
Artists claim “big” win in copyright suit fighting AI image generators (archive)
In an order on Monday, US district judge William Orrick denied key parts of motions to dismiss from Stability AI, Midjourney, Runway AI, and DeviantArt. The court will now allow artists to proceed with discovery on claims that AI image generators relying on Stable Diffusion violate both the Copyright Act and the Lanham Act, which protects artists from commercial misuse of their names and unique styles.
 
Kill copyright. Behead copyright. Roundhouse kick copyright into the concrete. Slam dunk copyright lovers into the trashcan. Crucify the filthy Lanham Act. Defecate in copyright's food. Launch copyright into the sun. Stir fry copyright in a wok. Toss copyright into active volcanoes. Urinate into a copyright lover's gas tank. Judo throw copyright into a wood chipper. Twist copyright'ss heads off. Report copyright's to the IRS, wait not that one. Karate chop copyright in half. Curb stomp the Copyright Act. Trap copyright lovers in quicksand. Crush copyright in the trash compactor. Liquefy copyright in a vat of acid. Eat copyright. Dissect copyright. Exterminate copyright in the gas chamber. Stomp copyright skulls with steel toed boots. Cremate copyright in the oven. Lobotomize copyright. Mandatory abolition of copyright. Grind copyright in the garbage disposal. Drown copyright in fried chicken grease. Vaporize copyright with a ray gun. Kick old copyright down the stairs. Feed copyright to alligators. Slice copyright with a katana.
 
Ok I played around a bit with the fast version someone here posted a website for and what directly really jumps out at you is how complex your text prompt can be. You can just heap on detail (within reason) and it will try to do it.

cube.jpgcube2.jpg
It grasps abstract stuff really really well, you can build all sorts of weird things and they will remain fairly consistent, really showing off it's good conceptual understanding. What I found amazing is that you just kinda describe something to it and that way, build it. This creature is not a thing (I am aware of) I "invented" it purely by describing it. It even transfers seamlessly into other styles.

creature1.jpgcreature2.jpg

woman.jpg
It's also a good wallpaper generator. Watermark inclusive.

comic1.jpg
Everything that is just remotely cartoon style tends to slip into big breasted anime girls really quickly. A negative prompt probably would help a lot here.


statue1.jpg
statue2.jpg

Another example of mixing of things that usually don't belong together and clean conceptual understanding making it possible. It needed a bit of wrangling to not turn the statue into a normal woman, but it worked really well. Verbose prompts are the key.

I'm very impressed, looking forward to having time to play with the dev version. This really lifted imagen to another level. It's funny considering that artists are still malding about SD and are probably not very aware that this exists.
 
Last edited:
If I hear one more retard claim that image diffusion models are "dangerous" I swear to god. If people look at the image below and think for a single fucking second that it could be real, that demonstrates a much worse problem with society at large being fucking retarded. Fucking stupid meme images aren't the thing people should worry about. It's what you see everyday that you actually can't determine as being generative AI or not that's actually scary.
1723908887016.png
 
If I hear one more retard claim that image diffusion models are "dangerous" I swear to god. If people look at the image below and think for a single fucking second that it could be real, that demonstrates a much worse problem with society at large being fucking retarded. Fucking stupid meme images aren't the thing people should worry about. It's what you see everyday that you actually can't determine as being generative AI or not that's actually scary.
View attachment 6317271
Do you think those are believable? I've played with using fastflux and putting "shaky, blurry photo" into the prompt. Do you think one could start hoaxes with these?
donald_trump_in_the_motion_of_sitting_down_into_a_black_car_near_white_house__shot_from_behind...jpgmissile_shot_down_by_aa_above_white_house__shaky__blurry_photo_of_the_sky.jpgsmall_explosion_in_the_middle_of_the_sky_very_high_above_white_house__shaky__blurry_photo_of_t...jpg
 
So what I noticed with Flux is that simple language seems to be actually somewhat determinal in an indirect way, if you repeat yourself a lot (which will increase the length of the instruction) this will lead to confusion which expresses itself very similar how it expresses itself in "normal" LLMs (ignoring parts of the instruction), also distorted, "confused" images like in SD with repetition. I would not discourage repetition altogether though, sometimes it does help to "drive a point home". Generally it seems to be better to restructure the prompt, though.

An interesting thing to attempt is to let an AI rewrite the prompt. I discounted automatic prompting with SD largely because in my experimenting, it simply did not lead to good results if you didn't feed the AI all the right keywords, and at that point, you might as well write the prompt yourself. It seems to work well with Flux though. If you consider that the images were probably categorized by AI in a conversational way as theorized in this thread earlier, it makes sense that another AI would find the "right language" (perhaps GPTisms?) to get exactly what you asked for.

Prompting Flux is very different from prompting SD and all the other models that came out (and perhaps MJ and Dall-E, never used those) and for optimal results instead of looking for the right keywords (which often, simply do not exist), it makes much more sense to just describe what you want. I know this has already been said on this very page, just for the sake of completeness.

For example:
coke.jpg
(video game adverts soon be like)

This is a comic style artwork of a generic fantasy harpy. When using the word "Harpy", "D&D Harpy" etc. flux usually generated a harpy eagle or some confused mashup of a bird and a woman, a woman holding a bird etc.. sometimes some anime abomination which seems to be the Flux fallback way (SDs fallback way was 00s style 3D renders). D&D style humanoid monsters are sort of my personal benchmark because most models really struggle with them, especially if they are half- something. So I just described the character, down to having a four-fingered claw like hand. The word harpy or even bird was not used once. After writing my verbose prompt I had it summarized by an LLM and it seemed to bring the perplexity down, I felt there were a lot less "failed" generations than with my handwritten prompt. Might be placebo, might be my ESL - I did not test it nearly long enough - but interesting if true.


dark1.jpgdark2.jpg
(broken off arm prompt! worked only in like 1-2 of 10 times though)
dark3.jpg
I think this is what the kids call having an aesthetic. (I love that this thing can actually do dark scenes) in stable diffusion models, some things are *always* lighted from the same direction. This leads to some scenes looking so unnatural. If you see it once, you cannot unsee it.

This concludes my post. Thanks for reading my blog!

cartoonyellow.png
 
Last edited:
The word harpy or even bird was not used once. After writing my verbose prompt I had it summarized by an LLM and it seemed to bring the perplexity down, I felt there were a lot less "failed" generations than with my handwritten prompt. Might be placebo, might be my ESL - I did not test it nearly long enough - but interesting if true.
What were both the LLM summarized and original prompts for the harpy?
 
Back