Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

FLUX.2: Frontier Visual Intelligence

The dev model is 64GB so you'll have to hope for some quants to come out later down the line.

EDIT: There are GGUF quants for it already out lol. 5-bit for 24GB havers and 6-bit for 32GB havers.

Z Image Turbo was released yesterday. It's a new Chinese model trained with 6 billion parameters (for comparison, Flux 2 has 32 billion). Uses a small Qwen LLM for text encoding, allowing natural language prompts with surprisingly good understanding. Text, prompt adherence, aesthetics, hands, everything is really, really good for a model that will run easily on 16GB VRAM. It doesn't know what a kiwi bird is though.
View attachment 8221756
HuggingFace model page
ComfyUI workflow

A base (i.e. non-turbo) and edit model are still to be released.

I'll take six billion over 32-64 billion any day. I tested it on my 3060. It works nicely on my 8gb card. Average generation time is around less than a minute.

Though, I feel it'll become the same song and dance with the whole "Look how REALISTIC™ it can make this 1girl shot! It beats (model everyone previously glazed); no competition!" shtick and then it's run-on-the-mill prompts or a vaguely attractive woman staring at you. I love messing with realism myself but there's experimenting and testing with what it's capable of and then there's glazing because it can do basic prompts but slightly more visually appealing.
 
flux2-dev-z.png
1764264578024.png
1764264650962.png
Chairman Xi did it again 🔥
 
What would be the most interesting models to play with - focus mostly realistic photos (I'm not talking about porn) and maybe some supernatural elements - I thought I might make illustrations for some RPG stuff, character portraits, things like that. But also just for fun to push the boundaries of what these models are capable of now. Would be fun to do some hyper-realistic character portraits that were short animations as well.
Both Flux 2 and the new Z Image model would be suitable for this, but Flux 2 is very heavy on system resources, I wouldn't bother with it unless you have at least 24GB VRAM, and even then you'll need to use a quantized version. In the realm of slightly older models, Qwen Image and Flux Krea, a finetune of Flux 1 Dev specifically designed to minimize the "AI look", should also work well. WAN 2.2 seems to be the video model everyone's using if you want to dabble with that. I haven't used it much myself.

As far as closed-source services go, Nano Banana Pro seems to be the ruler of the roost right now.
Also am I write in surmising that LoRAs and fine-tunes are less of a thing now? All the big discussion now seems to be around huge models like Flux 2 and such and I don't think there's the same community modding/LoRA/fine-tune sort of scene around those due to size? But maybe I've just not been looking.
Every model that's been out for a while has some amount of finetunes and LoRAs available. The bigger models don't have quite as many as SDXL and its derivatives, but there's still plenty around. Z Image Turbo hasn't even been out a week and there are already dozens of LoRAs for it on Civitai.

Though it is true that with more capable base models, there is less of a need for LoRAs. My unscientific, vibes-based opinion is that you see a lot more style LoRAs now, whereas before you'd have LoRAs for poses or things like five-fingered hands, and this reflects how models have progressed from fucking up human anatomy in a wide variety of artistic styles to mostly getting hands and proportions correct in a few basic styles.
Though, I fell it'll become the same shtick with the whole "Look how REALISTIC™ it can make this 1girl shot! It beats (model everyone previously glazed); no competition!"shtick and then it's run-on-the-mill prompts or a vaguely attractive woman staring at you. I love messing with realism myself but there's experimenting and testing with what it's capable of and then there's glazing because it can do basic prompts but slightly more visually appealing.
The devs have said themselves that Z Image Turbo is essentially an aesthetic finetune of their base model. And right now it's getting a lot of credit just because it mostly doesn't give people plastic skin, a problem that has continued to dog the base Flux models into the era of Flux 2. I've found it to be pretty good with fairly complex prompts featuring text, multiple distinct characters, different poses, five-fingered hands, coherent backgrounds, etc... and it does so a lot faster than other models with only slightly better prompt understanding.
 
Last edited:
Both Flux 2 and the new Z Image model would be suitable for this, but Flux 2 is very heavy on system resources, I wouldn't bother with it unless you have at least a 24GB VRAM, and even then you'll need to use a quantized version. In the realm of slightly older models, Qwen Image and Flux Krea, a finetune of Flux 1 Dev specifically designed to minimize the "AI look", should also work well. WAN 2.2 seems to be the video model everyone's using if you want to dabble with that. I haven't used it much myself.

As far as closed-source services go, Nano Banana Pro seems to be the ruler of the roost right now.
Thanks for that. I have managed to run Flux1.dev locally. It certainly wasn't instant but it completed in about 7mins iirc. Good enough to try out just to see it work, but not enough for blasting out concepts and seeing the effect of changes quickly. I also managed to get WAN 2.2 working locally but again, quite a while to wait for a short clip. Impressive that it's possible at all, though. What I was actually thinking was rent some time on a site like Runpod and just have at it with some higher end hardware and high VRAM. Given that it's paid time and also that it takes a while to upload the models to the storage (which I also pay for), I want to have a clear plan for what I want to try out. What I'd like to do is some realistic character portraits that were animated. I think that would be fun.

Every model that's been out for a while has some amount of finetunes and LoRAs available. The bigger models don't have quite as many as SDXL and its derivatives, but there's still plenty around. Z Image Turbo hasn't even been out a week and there are already dozens of LoRAs for it on Civitai.

Though it is true that with more capable base models, there is less of a need for LoRAs. My unscientific, vibes-based opinion is that you see a lot more style LoRAs now, whereas before you'd have LoRAs for poses or things like five-fingered hands, and this reflects how models have progressed from fucking up human anatomy in a wide variety of artistic styles to mostly getting hands and proportions correct in a few basic styles.
What I recall was there used to be an absolute tonne of LoRAs for particular styles and characters, fictional, celebrity, fantasy or sci-fi species, aesthetics for particular movie styles. At least there were in the SDXL era.

Ha! I just decided to try looking at Civitai to see what the LoRA ecosystem was for newer models. Got a big message telling me it was restricted for Bongland visitors. :suffering: It's okay, OFCOM - I wasn't looking for the dodgy stuff, I just wanted to know if there were good Sci-Fi aesthetics. Yeesh!
 
after playing with it for a bit, I predict Z Image will take off like Illustrious did (and unlike Flux which failed to really take off) once more of the official weights and tooling gets released, nothing else seems to have really dethroned SDXL base models quite yet. It depends on how cheap and fast the lora training can become and how versatile the model can be.
 
after playing with it for a bit, I predict Z Image will take off like Illustrious did (and unlike Flux which failed to really take off) once more of the official weights and tooling gets released, nothing else seems to have really dethroned SDXL base models quite yet. It depends on how cheap and fast the lora training can become and how versatile the model can be.
Just had a bit of a play with it now following @Slurred 's reply above and does seem pretty good.
 
AMD supported Amuse AI software goes open source, works with all GPU vendors
Amuse AI just went open source on GitHub, but it is not actually an AMD-made app. It is built by TensorStack AI, a small startup that works with AMD as a software partner and gets promoted on AMD’s site as a recommended front end for Ryzen AI and Radeon hardware. The GitHub repo calls this the “final version”, so this looks more like a curtain call than the start of a new development cycle.
To be honest, it looks like the project was effectively ended months ago. The last update was in April about a collab between AMD, Stability AI and AmuseAI:

Stability-AI-768x768.jpg
 
Z-Image Base is confirmed to be dropping soon


For those unfamiliar this is a new SOTA tunable image generator with nanobanana/flux like editing capabilities but opensource without being nerfed into the floor like western generators. Huge news not only for NSFW prompters but those who just want to make a silly picture of random internet celebs and are tired to g00gles random refusals to do completely innocent things and black forest's constant cringe bragging about how nerfed their models are and how difficult it is to overcome the censorship.

Freeing people from having to rely on gimped Western cloud models is a rare chinese win. The only issue for me is at 6 billion parameters I'm sure they have or capable of a much larger superior model that hopefully could be released soon as well
 
Z Image Turbo was released yesterday. It's a new Chinese model trained with 6 billion parameters (for comparison, Flux 2 has 32 billion). Uses a small Qwen LLM for text encoding, allowing natural language prompts with surprisingly good understanding. Text, prompt adherence, aesthetics, hands, everything is really, really good for a model that will run easily on 16GB VRAM. It doesn't know what a kiwi bird is though.
View attachment 8221756
HuggingFace model page
ComfyUI workflow

A base (i.e. non-turbo) and edit model are still to be released.
Try "pregnant grinch bird", thats what I always used to use to get kiwis.
 

Nano Banana Pro is fucking amazing. What the fuck kind of black magic is this, Google?

"Gordon Freeman is holding Twilight Sparkle by the midsection and using her as a living weapon. She is firing off her purple horn beam and looks very irritated. She is approximately 1.1 meters in height and looks bulky and difficult to wield. The whole thing is rendered in a realistic, painterly CG portrait style. No text."

Gemini_Generated_Image_rull2crull2crull.jpg

A tribute to Gloverfield the Gardevoir from Omeger Rubyer:

"Do Gloverfield, the chain-smoking yankee Gardevoir, having a cigarette and squatting in a school hallway like a delinquent. She is a literal Gardevoir, not a human. She is rendered in a CG painting style."

Gemini_Generated_Image_5l2vxr5l2vxr5l2v (1).jpg

"Her head tufts should be more upright and predatory looking, and her eyes should have a bloodshot, psychotic, murderous look to them."

Gemini_Generated_Image_ucpgs5ucpgs5ucpg.jpg

Same energy:

Gemini_Generated_Image_p0pasbp0pasbp0pa.jpg

Me Gusta HD:

Gemini_Generated_Image_c9yakfc9yakfc9ya.jpg
 
Nano Banana Pro is fucking amazing. What the fuck kind of black magic is this, Google?
It looks like the endgame for image generation/synthesis is in sight, particularly with the editing features that maintain coherence. I could easily see Photoshop working like that demo, except you import things as layers, and do the AI operations on selected layers.

The miscellaneous text and logos (and tramp stamp) are pretty good in your gens but could probably look better with some inpainting. Google clearly cherry picked some examples where text was "in focus" and getting a lot of attention from the model. Background changes away from the head in your Gardevoir example look very subtle but present. I need to check on another device with a better screen.

For the slavs of the world, what you get in a premium model should be free/local within 6-24 months. So it's worth looking at what these models are capable of. Seems like the "IP infringement" capability is working well, despite Disney suing Google a couple weeks ago.
 
It looks like the endgame for image generation/synthesis is in sight, particularly with the editing features that maintain coherence. I could easily see Photoshop working like that demo, except you import things as layers, and do the AI operations on selected layers.

The miscellaneous text and logos (and tramp stamp) are pretty good in your gens but could probably look better with some inpainting. Google clearly cherry picked some examples where text was "in focus" and getting a lot of attention from the model. Background changes away from the head in your Gardevoir example look very subtle but present. I need to check on another device with a better screen.

For the slavs of the world, what you get in a premium model should be free/local within 6-24 months. So it's worth looking at what these models are capable of. Seems like the "IP infringement" capability is working well, despite Disney suing Google a couple weeks ago.
You can already do this. Photoshop already has Nano Banana integration and you can drag-select and inpaint specific parts of an image with generative Nano Banana calls, confine it to specific layers, etc.


The reason why the background looks crappier and crappier with each consecutive edit is because, unfortunately, Nano Banana's edits are lossy. If you run 5+ consecutive edits on the same image, the whole thing looks like muddy crap.
 
Back
Top Bottom