Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
For what it's worth on a 4GB GPU and only 16GB of RAM, I can load the Schnell version of Flux and get an image in 1 step in a little over 2 minutes. Had to load EVERYTHING as FP8. The 20 minutes was on the first gen was my system dying as it pushed everything into swap for a good 10-15 minutes. 1 step vs 4 steps isn't very different composition wise and seems to have diminishing returns on detail.
1722555804989.png1722557125715.png
 
Here are some more of the Dall-E style images I was able to make, it still needs prompt finagling and some luck to get good results, but I assume Dall-E/Bing is doing a lot of stuff behind the scenes to your prompt anyway. It doesn't seem too censored, more like its just missing a lot of information rather than purposely gimped, I feel like finetunes and loras will be big for this in once it gets more optimized and further refined, like how sdxl took a while to become the main model people use.

1722553776861.png
1722553788564.png
1722553805075.png
 
I wonder if you could generate other things that this FLUX model isn't good at, then use FLUX to inpaint whatever else you want to add.
 
You guys should try PonyDiffusion, on anything RTX it runs great, probably the best model out there.
The prompting is weird (you only use rule34/danbooru tags) but the end result is great. Loras for it are great too.
 
  • Deviant
Reactions: BrunoMattei
There are a few people on the Github for it demanding to know why they would make a model that requires so much VRAM and a reply that is basically "LOL - poor!". Honestly, I think Flux shows what happens when you stop trying to include everybody. That fact it eats up 20+ GB of VRAM is a factor in why it's so good. I haven't tried running it locally - only in Runpod for which I'm paying the privilege of around $0.88 per hour, eh - in return I get 48GB of VRAM and the ability to generate images with fp16 and in around 8 seconds per image. Here's my first impressions:

My quick random female assassin with a crossbow. No thought, no particular fanciness. And wow - impressive realism and atmosphere. Keep in mind this is a pure base model.
flux_ (2).png

Okay, lets see how it does with multiple people. Asked for male and female runners standing side by side (incidentally, prompts should be in the image metadata if you want the full text).
flux_ (7).png
Great - understands how to do two people. It made the man gay though. Lets see if we can restore his heterosexuality and test its ability to understand associating directives with one particular figure. I told it to put a white t-shirt on the man only.
flux_ (8).png
Voila - birth rate restored and Flux proving it can isolate prompt details.

Lets try particular poses. I asked for the Archangel Michael. He was to be holding aloft a flaming sword, his wings were to be unfurled and the view was to be from a low angle. I'd asked for high detail and it pretty much delivered. I continue to be very impressed.
flux_ (12).png

What about different aspect ratios? I couldn't find any guidelines from them on aspect ratios and resolutions but I took a stab at using the same ones as SDXL. Same prompt, different aspect. Wow - even better. And one for the artists - Flux brought in a little contraposto. Flame on, Michael!
flux_ (13).png

So there's no reference pic for Archangel Michael, lets try for some known realworld people. Someone already posted a Trump above, but was their Trump riding a dinosaur? I think not!
flux_ (9).png

Lets broaden the political figures out. Can we get a Kamala Harris in there?
flux_ (10).png

What about a cyborg Putin with "half his face covered with metal and electronics"? Yep - followed the prompt very well understanding what I meant and creating this foreboding Phantom of the Kremlin. Z-Man beware!
flux_ (17).png

A smattering of fictional and comic book characters. Does it know who Supergirl and Batgirl are with no Lora or fine-tuning? Yep - it's comics aware. (Nice detail btw, I asked for sunrise and sunset, iirc).
flux_ (3).pngflux_ (4).png
What about WH40K. Surely it wont know what a Space Marine is? By the Emperor, it does (more or less). Also, nice text insert. I only haven't been posted experiments with that because other people are focusing on it already. Got to remark on how good it is though. Word for word and exactly where I asked it to be.
flux_ (23).png

Alright - situational awareness. I asked for her to be in the cockpit of a fighter plane, sky behind her and facing the viewer. For a quick off-the-cuff attempt, no img-to-img but just a casual text prompt this is very impressive. Seriously, it knos what a cockpit is and gave me the background and positioning of her that I wanted.
flux_ (11).png

Lets give it a little clothing test and see how it follows direction there. One 18th Century vampire for @Susanna coming up. Specifically asked for 18th Century dress.

flux_ (14).png
No, no, no - more vampire! That doesn't scare me at all. I added details for red velvet and iirc gold embroidery and redder eyes. Ah - now that's someone I do don't do don't want to meet coming home at night. I was asking for soft-focus / blurred background and it seemed to understand that.
flux_ (15).png

Final test for prompt following with clothes and style and background - expensive blue armour, braided hair, interior with stone columns, out of focus, posed with head turned towards viewer - nailed every aspect of the prompt.

braids_expensive_blue_armour.png


Quick DEI check - ancient egyptians and ancient greek philosophers. Google thinks both of these are sub-Saharan Africans. What does Flux think? Both look pretty spot on to me. Also love its clothing choices for both of these as I didn't specify anything. More men should wear robes and have gold headdresses.
flux_ (5).pngflux_ (6).png

I've been doing all people. Quick sanity test for a landscape - Mountain view looking downwards with snow and rocks:

flux_ (19).png
Oh, and give me a dragon, specifically flying, specifically in the distance:
flux_ (20).png

Love it - can you make it a pencil or charcoal sketch please?
flux_ (21).pngflux_ (22).png

Nice. But what about oil paintings? And how about we test TASTEFUL nudity at the same time.
flux_ (18).png
Wow - pretty much exactly what I asked for in the prompt - reclining on a bed, Renaissance style oil painting, long red hair, nude. It picked out an artistic pose all by itself - wouldn't want to have been the model having to hold that pose for Rembrandt for a couple months! Nipples a little odd but again - base model. And I really like the effect it places at the edge of the picture to show it's a canvas.

Okay, challenge mode - multiple figures in a particular pose interacting with varying emotions. I specifically wanted Batman to be annoyed / angry and Supergirl to be smiling / happy, I needed them seated and arm-wrestling so hands clasped. Two attempts at different realism levels:
flux_ (29).pngflux_ (30).png
Holy crap it did it! First off it accurately attributed different facial expressions to the requested characters. Historically that's been quite tricky. Batman looks really frustrated in the first one. So much so that he's cheating and using an extra finger, but he still can't win. I felt I wanted a greater discrepancy in their body sizes so I specified to make Batman more muscular and Supergirl smaller and skinnier. It worked (that is the second of the more realistic version, the first more realistic version they were closer in size). Batman appears to be using two hands in the second one but I'll allow it. ;) It took multiple attempts to get the two. It did keep wanting to put a batman mask on Supergirl but not always. Facial expressions were variable. I picked out the best.

Coming to the end now and just a few odd experiments that I wanted to try. Something I could never properly get out of anything were spines or a crest. I made a lizardman and specified "a yellow crest". Nailed it first time:
flux_ (28).png
(all the other details, colour of scales, holding spear, savage clothing style, muscles, emerging from a swamp, all perfectly followed the prompt as well)

A few innocent little children's book illustrations. Mixed results but am sure could get closer to what I wanted with actual effort.
flux_ (25).pngflux_ (26).pngflux_ (27).png

I am a guy so of course I did try out at making a beautiful woman. Thankfully I have wholesome tastes.

flux_ (31).png
and interestingly you can contrast that with a previous one made with Stable Diffusion (can't remember which checkpoint exactly but was a fine-tuned SDXL)
sd3.png

Ehhh, okay - two slightly more cheesecake ones but mainly just to play around with some fantasy art and facial expressions. What actually got me though was how well it interpreted "blood covered". Look at the way it drips from the barbarian's axe:
bloody_barbarian.pngflux_ (33).png

Flux is a spectacular success, imo. And a testament to what you can achieve when you don't restrict yourself to the lowest common denominator. I hope people found this interesting.
 

Attachments

  • flux_ (27).png
    flux_ (27).png
    1.2 MB · Views: 20
  • flux_ (5).png
    flux_ (5).png
    1.4 MB · Views: 31
You guys should try PonyDiffusion, on anything RTX it runs great, probably the best model out there.
The prompting is weird (you only use rule34/danbooru tags) but the end result is great. Loras for it are great too.
Might come back to XL eventually when I make XL versions of LoRAs I've made, but 1.5 is still pretty decent. I hear AutismMix is a pretty nifty derivative of Pony Diffusion, though.
 
Alright - situational awareness. I asked for her to be in the cockpit of a fighter plane, sky behind her and facing the viewer. For a quick off-the-cuff attempt, no img-to-img but just a casual text prompt this is very impressive. Seriously, it knos what a cockpit is and gave me the background and positioning of her that I wanted.
Wes Anderson's Top Gun.

(Top Gun is too happy for Wes though, he'd remake The Great Santini instead.)
 
  • Like
Reactions: Overly Serious
Wes Anderson's Top Gun.

(Top Gun is too happy for Wes though, he'd remake The Great Santini instead.)
I would make a pithy comment but I've never seen Top Gun and know only that it is about fighter pilots. And there might be something to do with making cocktails. However, if Wes Anderson did make a Top Gun movie I suspect it would be extremely weird. Tonnes of slow dialogue, probably a child would accidentally end up flying the jet whilst her mum and dad tell her it's okay and she can do this over the radio. The love interest would be played by Willem Defoe.

Okay, this image is a little catty but I asked Flux to make me an image of a woman lying on the grass, just to poke fun at SD3.

flux_ (24).png
 
There are a few people on the Github for it demanding to know why they would make a model that requires so much VRAM and a reply that is basically "LOL - poor!". Honestly, I think Flux shows what happens when you stop trying to include everybody. That fact it eats up 20+ GB of VRAM is a factor in why it's so good.
I'm not sure if there's a thread for LLMs, but the same addage holds, and I think that's the reason Meta axed any Illama models other than the smallest and largest. There's simply no comparing a properly tuned 70B with a properly tuned 8B, and that's even with Llama3 pushing 8K context on the 8B. Quantized down to 4Bit you still need over the amount of VRAM any single consumer card can provide. You are looking at a dual card system at that point, or going all in and running a ESC4000 in your basement which will make it sound like a jet hanger.
 
  • Like
Reactions: Overly Serious
DRAM and NAND revenue could swell to $90 billion and $67 billion respectively in 2024, thanks to AI-driven demand (archive)
One is average DRAM prices, which increased by 53% in 2024 and could rise another 35% in 2025.
If you see a good opportunity to upgrade to 32-64 GB of RAM, take it. I got 64 GB that I don't even need earlier this year.

FLUX: This new AI image generator is eerily good at creating human hands (archive)
jOmxRsxgXSWnh5YEvFI5b-800x450.png
Proompt: "A beautiful queen of the universe holding up her hands, face in the background."
2N-PN_fOMNFnOcHFuZxEy.png
"A close-up photo of a pair of hands holding a plate full of pickles."

Fb8sTB7fGOJNGXjh8PJC-.png

ZnEINKepatC0tmA3zUdrR1.png
"An Ars Technica reader sitting in front of a computer monitor. The screen shows the Ars Technica website."

ITRjYCAv689vHPubZFYDl.png
"a boxer posing with fists raised, no gloves."

agaeYOqRlqW4T2mfDATVK.png
"An advertisement for 'Frosted Prick' cereal."

EKn6JJtFhXkLhh4LUieJn.png

wUwnSSf9nOZqpRBdllWp.png
"An advertisement for 'Marshmallow Menace' cereal."

lbqsCnGR-28k6fDHowJVT.png
"A handsome Asian influencer on top of the Empire State Building, instagram"

Ojm376QCpdtsochcwAqp9.png

dRUDyAxi9Ph44w9ofcCf2.png

Utieal9CPWataOicx0qKV.png
"a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting."

SO3exaBHlecwGM8V9cI1B.png

vnui1jl-i5bN3f6CI56az.png
"Will Smith eating spaghetti."

mggCtWUwjNwZCvpRqKvCy.png
"a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting. The screen reads 'Ars Technica.'"

u4yPlIKTxRPcy6AbE0y_f.png
"An advertisement for 'Burt's Grenades' cereal."

f3VXaxMOHINVMmlZ7ps3.png
"A close-up photo of a pair of hands holding a plate that contains a portrait of the queen of the universe"

That fact it eats up 20+ GB of VRAM is a factor in why it's so good.
I hope we see 32 GB consoomer GPUs soon. Also, if it wasn't mentioned in this thread, the RTX 5090 is expected to bring 28 GB (448-bit) instead of 24 GB (384-bit).
 
I'm using the web version of Flux and I'm having trouble generating exactly what I want. Any pointers?

Prompt:

Night time moody lighting, a woman in all black, eyes blindfolded with a black scarf, using both hands holding an ancient broadsword pointing up, with both hands on the hilt holding it up, surrounded by darkness, there is no forest, there is no beach, she is skipping atop water in a pond with waves dispersing at each graceful ballerina step from left to right, the water is clear and beautiful, atmospheric dark shot, camera is looking down at her.

Result:

replicate-prediction-rz0wcmj7kdrm60ch2hctzgq9g0 (1).jpg



I can't get her to hold the sword exactly as described and the background to be nothingness. Any advice for lighting would be helpful. Thanks.
 
I would make a pithy comment but I've never seen Top Gun and know only that it is about fighter pilots.
For the purposes of the joke that's all you need to know. That's someone's estranged daughter sitting rigidly in a very symmetrically framed cockpit.

And damn that woman in grass flex. This thing really cleans up a lot of stuff.
 
Back