Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

verymuchawful · Aug 1, 2024

For what it's worth on a 4GB GPU and only 16GB of RAM, I can load the Schnell version of Flux and get an image in 1 step in a little over 2 minutes. Had to load EVERYTHING as FP8. The 20 minutes was on the first gen was my system dying as it pushed everything into swap for a good 10-15 minutes. 1 step vs 4 steps isn't very different composition wise and seems to have diminishing returns on detail.

Soggy Floppa · Aug 1, 2024

Here are some more of the Dall-E style images I was able to make, it still needs prompt finagling and some luck to get good results, but I assume Dall-E/Bing is doing a lot of stuff behind the scenes to your prompt anyway. It doesn't seem too censored, more like its just missing a lot of information rather than purposely gimped, I feel like finetunes and loras will be big for this in once it gets more optimized and further refined, like how sdxl took a while to become the main model people use.

Train Operator · Aug 2, 2024

Kola said:
If you want to play with it for free without making an account, here's a link: https://replicate.com/black-forest-labs/flux-dev

not bad

anustart76 · Aug 2, 2024

I wonder if you could generate other things that this FLUX model isn't good at, then use FLUX to inpaint whatever else you want to add.

AmpleApricots · Aug 2, 2024

It's pretty good at pixel art but some things it just doesn't know. I need to play around with it when I have more time.

NSA MIND CONTROL DEVICE · Aug 2, 2024

You guys should try PonyDiffusion, on anything RTX it runs great, probably the best model out there.
The prompting is weird (you only use rule34/danbooru tags) but the end result is great. Loras for it are great too.

Overly Serious · Aug 2, 2024

There are a few people on the Github for it demanding to know why they would make a model that requires so much VRAM and a reply that is basically "LOL - poor!". Honestly, I think Flux shows what happens when you stop trying to include everybody. That fact it eats up 20+ GB of VRAM is a factor in why it's so good. I haven't tried running it locally - only in Runpod for which I'm paying the privilege of around $0.88 per hour, eh - in return I get 48GB of VRAM and the ability to generate images with fp16 and in around 8 seconds per image. Here's my first impressions:

My quick random female assassin with a crossbow. No thought, no particular fanciness. And wow - impressive realism and atmosphere. Keep in mind this is a pure base model.

Okay, lets see how it does with multiple people. Asked for male and female runners standing side by side (incidentally, prompts should be in the image metadata if you want the full text).

Great - understands how to do two people. It made the man gay though. Lets see if we can restore his heterosexuality and test its ability to understand associating directives with one particular figure. I told it to put a white t-shirt on the man only.

Voila - birth rate restored and Flux proving it can isolate prompt details.

Lets try particular poses. I asked for the Archangel Michael. He was to be holding aloft a flaming sword, his wings were to be unfurled and the view was to be from a low angle. I'd asked for high detail and it pretty much delivered. I continue to be very impressed.

What about different aspect ratios? I couldn't find any guidelines from them on aspect ratios and resolutions but I took a stab at using the same ones as SDXL. Same prompt, different aspect. Wow - even better. And one for the artists - Flux brought in a little contraposto. Flame on, Michael!

So there's no reference pic for Archangel Michael, lets try for some known realworld people. Someone already posted a Trump above, but was their Trump riding a dinosaur? I think not!

Lets broaden the political figures out. Can we get a Kamala Harris in there?

What about a cyborg Putin with "half his face covered with metal and electronics"? Yep - followed the prompt very well understanding what I meant and creating this foreboding Phantom of the Kremlin. Z-Man beware!

A smattering of fictional and comic book characters. Does it know who Supergirl and Batgirl are with no Lora or fine-tuning? Yep - it's comics aware. (Nice detail btw, I asked for sunrise and sunset, iirc).

What about WH40K. Surely it wont know what a Space Marine is? By the Emperor, it does (more or less). Also, nice text insert. I only haven't been posted experiments with that because other people are focusing on it already. Got to remark on how good it is though. Word for word and exactly where I asked it to be.

Alright - situational awareness. I asked for her to be in the cockpit of a fighter plane, sky behind her and facing the viewer. For a quick off-the-cuff attempt, no img-to-img but just a casual text prompt this is very impressive. Seriously, it knos what a cockpit is and gave me the background and positioning of her that I wanted.

Lets give it a little clothing test and see how it follows direction there. One 18th Century vampire for @Susanna coming up. Specifically asked for 18th Century dress.

No, no, no - more vampire! That doesn't scare me at all. I added details for red velvet and iirc gold embroidery and redder eyes. Ah - now that's someone I do ~~don't~~ do don't want to meet coming home at night. I was asking for soft-focus / blurred background and it seemed to understand that.

Final test for prompt following with clothes and style and background - expensive blue armour, braided hair, interior with stone columns, out of focus, posed with head turned towards viewer - nailed every aspect of the prompt.

Quick DEI check - ancient egyptians and ancient greek philosophers. Google thinks both of these are sub-Saharan Africans. What does Flux think? Both look pretty spot on to me. Also love its clothing choices for both of these as I didn't specify anything. More men should wear robes and have gold headdresses.

I've been doing all people. Quick sanity test for a landscape - Mountain view looking downwards with snow and rocks:

Oh, and give me a dragon, specifically flying, specifically in the distance:

Love it - can you make it a pencil or charcoal sketch please?

Nice. But what about oil paintings? And how about we test TASTEFUL nudity at the same time.

Wow - pretty much exactly what I asked for in the prompt - reclining on a bed, Renaissance style oil painting, long red hair, nude. It picked out an artistic pose all by itself - wouldn't want to have been the model having to hold that pose for Rembrandt for a couple months! Nipples a little odd but again - base model. And I really like the effect it places at the edge of the picture to show it's a canvas.

Okay, challenge mode - multiple figures in a particular pose interacting with varying emotions. I specifically wanted Batman to be annoyed / angry and Supergirl to be smiling / happy, I needed them seated and arm-wrestling so hands clasped. Two attempts at different realism levels:

Holy crap it did it! First off it accurately attributed different facial expressions to the requested characters. Historically that's been quite tricky. Batman looks really frustrated in the first one. So much so that he's cheating and using an extra finger, but he still can't win. I felt I wanted a greater discrepancy in their body sizes so I specified to make Batman more muscular and Supergirl smaller and skinnier. It worked (that is the second of the more realistic version, the first more realistic version they were closer in size). Batman appears to be using two hands in the second one but I'll allow it.

It took multiple attempts to get the two. It did keep wanting to put a batman mask on Supergirl but not always. Facial expressions were variable. I picked out the best.

Coming to the end now and just a few odd experiments that I wanted to try. Something I could never properly get out of anything were spines or a crest. I made a lizardman and specified "a yellow crest". Nailed it first time:

(all the other details, colour of scales, holding spear, savage clothing style, muscles, emerging from a swamp, all perfectly followed the prompt as well)

A few innocent little children's book illustrations. Mixed results but am sure could get closer to what I wanted with actual effort.

I am a guy so of course I did try out at making a beautiful woman. Thankfully I have wholesome tastes.

and interestingly you can contrast that with a previous one made with Stable Diffusion (can't remember which checkpoint exactly but was a fine-tuned SDXL)

Ehhh, okay - two slightly more cheesecake ones but mainly just to play around with some fantasy art and facial expressions. What actually got me though was how well it interpreted "blood covered". Look at the way it drips from the barbarian's axe:

Flux is a spectacular success, imo. And a testament to what you can achieve when you don't restrict yourself to the lowest common denominator. I hope people found this interesting.

Baraadmirer · Aug 2, 2024

NSA MIND CONTROL DEVICE said:
You guys should try PonyDiffusion, on anything RTX it runs great, probably the best model out there.
The prompting is weird (you only use rule34/danbooru tags) but the end result is great. Loras for it are great too.

Might come back to XL eventually when I make XL versions of LoRAs I've made, but 1.5 is still pretty decent. I hear AutismMix is a pretty nifty derivative of Pony Diffusion, though.

whatever I feel like · Aug 2, 2024

Flux is going very well for me, but my Mr. Popos are being stymied by it not really doing "DBZ style", I just get actual obese black men with red lipstick. Ahh well.

Mikoyan · Aug 2, 2024

Overly Serious said:
Alright - situational awareness. I asked for her to be in the cockpit of a fighter plane, sky behind her and facing the viewer. For a quick off-the-cuff attempt, no img-to-img but just a casual text prompt this is very impressive. Seriously, it knos what a cockpit is and gave me the background and positioning of her that I wanted.

Wes Anderson's Top Gun.

(Top Gun is too happy for Wes though, he'd remake The Great Santini instead.)

Overly Serious · Aug 2, 2024

Mikoyan said:
Wes Anderson's Top Gun.

(Top Gun is too happy for Wes though, he'd remake The Great Santini instead.)

I would make a pithy comment but I've never seen Top Gun and know only that it is about fighter pilots. And there might be something to do with making cocktails. However, if Wes Anderson did make a Top Gun movie I suspect it would be extremely weird. Tonnes of slow dialogue, probably a child would accidentally end up flying the jet whilst her mum and dad tell her it's okay and she can do this over the radio. The love interest would be played by Willem Defoe.

Okay, this image is a little catty but I asked Flux to make me an image of a woman lying on the grass, just to poke fun at SD3.

macrodegenerate · Aug 2, 2024

Overly Serious said:
There are a few people on the Github for it demanding to know why they would make a model that requires so much VRAM and a reply that is basically "LOL - poor!". Honestly, I think Flux shows what happens when you stop trying to include everybody. That fact it eats up 20+ GB of VRAM is a factor in why it's so good.

I'm not sure if there's a thread for LLMs, but the same addage holds, and I think that's the reason Meta axed any Illama models other than the smallest and largest. There's simply no comparing a properly tuned 70B with a properly tuned 8B, and that's even with Llama3 pushing 8K context on the 8B. Quantized down to 4Bit you still need over the amount of VRAM any single consumer card can provide. You are looking at a dual card system at that point, or going all in and running a ESC4000 in your basement which will make it sound like a jet hanger.

BrunoMattei · Aug 2, 2024

Flux is fucking impressive.

The Mass Shooter Ron Soye · Aug 2, 2024

DRAM and NAND revenue could swell to $90 billion and $67 billion respectively in 2024, thanks to AI-driven demand (archive)

One is average DRAM prices, which increased by 53% in 2024 and could rise another 35% in 2025.

If you see a good opportunity to upgrade to 32-64 GB of RAM, take it. I got 64 GB that I don't even need earlier this year.

FLUX: This new AI image generator is eerily good at creating human hands (archive)

Proompt: "A beautiful queen of the universe holding up her hands, face in the background."

"A close-up photo of a pair of hands holding a plate full of pickles."

"An Ars Technica reader sitting in front of a computer monitor. The screen shows the Ars Technica website."

"a boxer posing with fists raised, no gloves."

"An advertisement for 'Frosted Prick' cereal."

"An advertisement for 'Marshmallow Menace' cereal."

"A handsome Asian influencer on top of the Empire State Building, instagram"

"a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting."

"Will Smith eating spaghetti."

"a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting. The screen reads 'Ars Technica.'"

"An advertisement for 'Burt's Grenades' cereal."

"A close-up photo of a pair of hands holding a plate that contains a portrait of the queen of the universe"

Overly Serious said:
That fact it eats up 20+ GB of VRAM is a factor in why it's so good.

I hope we see 32 GB consoomer GPUs soon. Also, if it wasn't mentioned in this thread, the RTX 5090 is expected to bring 28 GB (448-bit) instead of 24 GB (384-bit).

whatever I feel like · Aug 2, 2024

I am shocked that this thing does writing so well. I bow before our gangster communist computer God.

BrunoMattei · Aug 2, 2024

I'm using the web version of Flux and I'm having trouble generating exactly what I want. Any pointers?

Prompt:

Night time moody lighting, a woman in all black, eyes blindfolded with a black scarf, using both hands holding an ancient broadsword pointing up, with both hands on the hilt holding it up, surrounded by darkness, there is no forest, there is no beach, she is skipping atop water in a pond with waves dispersing at each graceful ballerina step from left to right, the water is clear and beautiful, atmospheric dark shot, camera is looking down at her.

Result:

replicate-prediction-rz0wcmj7kdrm60ch2hctzgq9g0 (1).jpg

I can't get her to hold the sword exactly as described and the background to be nothingness. Any advice for lighting would be helpful. Thanks.

whatever I feel like · Aug 2, 2024

Look up the proper name for the sword pose you want and say that instead. Should work as long as there's enough images of that in the dataset.

Mikoyan · Aug 2, 2024

Overly Serious said:
I would make a pithy comment but I've never seen Top Gun and know only that it is about fighter pilots.

For the purposes of the joke that's all you need to know. That's someone's estranged daughter sitting rigidly in a very symmetrically framed cockpit.

And damn that woman in grass flex. This thing really cleans up a lot of stuff.

AnimeSucksCopeAndSneed · Aug 2, 2024

Flux is the future.

Monolith · Aug 3, 2024

Kola said:
It spells out even pretty long text very well.

View attachment 6261467

View attachment 6261475

View attachment 6261482

View attachment 6261523

View attachment 6261554

I didn't even need to generate multiple images. Gets it on first try.

If you want to play with it for free without making an account, here's a link: https://replicate.com/black-forest-labs/flux-dev

Moderately long text, anyways.

Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

verymuchawful

Enjoy prison, sticker child.

Soggy Floppa

Train Operator

conductor on the glow rail

anustart76

I can't believe I shaved my dick for this.

AmpleApricots

NSA MIND CONTROL DEVICE

Overly Serious

Attachments

Baraadmirer

💪🍦💪

whatever I feel like

Mushroom Kingdom Uber Alles!

Mikoyan

Christmoose Example

Overly Serious

macrodegenerate

Generative AI was a mistake

BrunoMattei

No I am not the Cinema Snob

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

whatever I feel like

Mushroom Kingdom Uber Alles!

BrunoMattei

No I am not the Cinema Snob

whatever I feel like

Mushroom Kingdom Uber Alles!

Mikoyan

Christmoose Example

AnimeSucksCopeAndSneed

"Asscas"

Monolith

Of course Jesus is white, the Bible's in English.