Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

Slav Power · Mar 17, 2025

You should've bought more...

The Mass Shooter Ron Soye · Mar 18, 2025

Farewell Photoshop? Google’s new AI lets you edit images by asking. (archive)

Looks neat. Many examples in the article.

Slav Power · Mar 22, 2025

Since ComfyUI technically has an official logo now, I've made a Windows icon for shortcuts, as well as favicons for comfyui-custom-scripts to make it feel a bit more like a proper piece of standalone software.

You can also kinda make it a standalone program if your web browser allows you to turn any website into a Progressive Web App. Hopefully it'll become truly standalone in the near future, without having to do this ugly dichotomy of a command line and a browser tab to use it.

DavidS877 · Mar 22, 2025

Slav Power said:
Hopefully it'll become truly standalone in the near future, without having to do this ugly dichotomy of a command line and a browser tab to use it.

I like the distributed model. I can start it on any one of my systems then control it from my desktop. I also wrote a little script that watches the queue on one node and ships jobs to other nodes when they are idle.

Slav Power · Mar 22, 2025

DavidS877 said:
I like the distributed model. I can start it on any one of my systems then control it from my desktop. I also wrote a little script that watches the queue on one node and ships jobs to other nodes when they are idle.

Turns out that there are already works on a desktop version with it's own Electron framework. I tried migrating to it but it's a goddamn mess, so for the time being a good alternative for a clean experience for me is just running ComfyUI as a service via NSSM and using the PWA functionality of my browser. I have so much RAM and VRAM to spare it won't interfere with my day-to-day use.

inception_state · Mar 24, 2025

Slav Power said:
Any info about whether or not it's possible to run it on consumer GPU's? They recommend 80GB GPU's which only exist in the enterprise hardware sphere, or the home lunatic sphere where you NVLink four 3090's together.

Soggy Floppa said:
You can run Hunyuan on as low as 8-12gb with specific workflows, it seems like you need at least 24 for the img2video model at the moment but I expect that to go down as people do their magic

I've got the 14b wan2.1 model running on my 4090 with a kind of janky quantized setup similar to: https://rentry.org/wan21kjguide/ - it seems to peak around 20 GB of VRAM.

I'd definitely recommend updating to torch nightly as well, it sped things up by maybe 20%. eg.
"pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126 --force-reinstall"

If it breaks horribly just reinstall the old version from comfy's requirements.txt.

It takes about 15 minutes to do image to video and generate a five second clip at 720p. 480p is more like 2-3 minutes.

inception_state · Mar 25, 2025

The Tall Man said:
How was this made? I would love to know what was used to make this turntable video.

Probably this lora or something similar: https://civitai.com/models/1346623/360-degree-rotation-microwave-rotation-wan21-i2v-lora

It works pretty well, here's an example.

Lucas Silva #4903141 · Mar 26, 2025

What is being used for the studio ghibli style images that I am seeing on twitter today? Some are saying it's GPT 4.0 + a source image. Others are saying this is being blocked already.

Rembrandt · Mar 27, 2025

Lucas Silva #4903141 said:
What is being used for the studio ghibli style images that I am seeing on twitter today? Some are saying it's GPT 4.0 + a source image. Others are saying this is being blocked already.

Grok AI from twitter apparently. Here is a main thread https://kiwifarms.st/threads/the-studio-ghibli-meme-gallery.215562/

Slurred · Mar 27, 2025

Rembrandt said:
Grok AI from twitter apparently. Here is a main thread https://kiwifarms.st/threads/the-studio-ghibli-meme-gallery.215562/

Grok might be used for a couple of these, but the majority are made with ChatGPT's new 4o image generation.

Here's a comparison. Given this input image:

and the prompt "Generate a version of this in Studio Ghibli anime style", Grok generates this:

Meanwhile, with the same image and prompt, 4o generates this:

Both Grok and ChatGPT use a different image generation approach from the diffusion models that have been popular for the last few years. They're autoregressive models, which was actually the approach that DALL-E 1 used. As far as I understand it, this means that the model draws the image in batches of pixels, moving from the top left to the bottom right, rather than denoising an entire blurry picture at once in a series of steps like diffusion models do. You can see this with ChatGPT, as the image generation takes a long time and fills in from top to bottom.

Style transfer like this has long been technically possible with diffusion models, but to get halfway decent results you need to tinker with different IPAdapter models and ControlNets and get lucky with random numbers.

In comparison, 4o's autoregressive model just lets you upload a picture and say "make it anime."

Unfortunately, as always happens with closed source models, OpenAI is already nerfing it. I got this response when attempting to generate a Ghibli ChrisChan:

I wasn't able to generate that image because the subject appears to be a recognizable real person, and I need to follow guidelines that protect individuals' privacy and identity. If you have another idea or request—like an original character, a concept, or a stylized version of a different scene—I’d be happy to help!

Fae Supremacist · Mar 27, 2025

Slurred said:
Unfortunately, as always happens with closed source models, OpenAI is already nerfing it.

The silver lining is that Chinese firms now have a firm target to beat, 4o native image gen appears to be a multimodal autoregression model (tokenizing each pixel and doing reasoning in pixel space) while FOSS solutions used diffusion and write always ahead of autoregression in terms of resolution and quality.
OAI somehow overcame the shitty quality of autoregression, or maybe it's a hybrid model like Meta had proposed where the token model is used to make a low res scaffold and let diffusion take on from there, nobody can tell now. But since open source diffusion is no longer cutting edge, there's room to improve.

And they're also usually far larger than diffusion models so good luck running those locally, best case scenario a model is open sourced and you get to choose from a selection of providers that run it uncensored.

The Voicemail Devil · Mar 27, 2025

Fae Supremacist said:
And they're also usually far larger than diffusion models so good luck running those locally, best case scenario a model is open sourced and you get to choose from a selection of providers that run it uncensored.

I had a good experience with fireworks.ai's playground showcase of Deepseek V3 03-24 yesterday. At this point chyna's open model weights can be hosted by API providers cheaply enough that it's free for casual use, almost free for everything else, and still performs pretty close to cutting edge. Fingers crossed that they keep not having to care about pressures to censor.

How many tokens do you think are in a 512x512 image for performance like 4o is showing, by the way? For a ballpark estimate of how much this will cost if API services with the model weights charge by the token.

Fae Supremacist · Mar 27, 2025

The Voicemail Devil said:
How many tokens do you think are in a 512x512 image for performance like 4o is showing, by the way? For a ballpark estimate of how much this will cost if API services with the model weights charge by the token.

Hard to say because their exact method is unknown. Back in gpt-4o-vision that would just make calls to DALL-E, a 512x512 tile would be about 6k tokens, and that's also a regression model. It could probably be tens of thousands.

NSJAP · Mar 29, 2025

This looks gloriously. I would unironically buy this car.

DeadwastePrime · Mar 29, 2025

NSJAP said:
View attachment 7152101

This looks gloriously. I would unironically buy this car.

thats just a jeep wrangler basically

Rungle · Mar 31, 2025

My newest hobby since the new OpenAi Img gen is to feed it random images and turn it into 40k.

file_00000000e8205246912cfdccd4903e72_conversation_id=67e889ad-a088-8009-82d2-ba0265f86bff&mes...PNG

file_000000004644524687c18e1d8069381e_conversation_id=67e889ad-a088-8009-82d2-ba0265f86bff&mes...PNG

file_0000000019ac5246bd0de0987f5c9687_conversation_id=67e889ad-a088-8009-82d2-ba0265f86bff&mes...PNG

Egregore · Mar 31, 2025

Rungle said:
My newest hobby since the new OpenAi Img gen is to feed it random images and turn it into 40k.
View attachment 7158179 View attachment 7158180

View attachment 7158181
View attachment 7158182

View attachment 7158186
View attachment 7158184

I really like the Todd one.

PandaChai · Mar 31, 2025

PandaChai said:
News Hams reporting on the hollerin' hog giving them a bad name

News Hams breaking into the real world, sort of

Bâtard Consanguine · Mar 31, 2025

I've been playing a lot of GURPS games with friends on Tabletop Simulator, and AI slop is perfect for making character images to use as tokens. I made my account here using one from a game we recently concluded, a campaign set during the first crusade. I wanted to go for a Tyrion Lannister archetype, but horribly inbred. It only took a few photoshop alterations to get what I had in mind. It must really suck for people who would normally take commissions for this kind of thing. What would this have even cost, and how long would it have taken to get back?

very-short-obese-45-year-old-midget-wearing-11th-century-french-armor-in-a-medieval-painting-s...png

The Mass Shooter Ron Soye · Apr 1, 2025

Ars Technica: With new Gen-4 model, Runway claims to have finally achieved consistency in AI videos (archive)

The new model is rolling out to paid users starting today.

Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

Slav Power

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

Slav Power

Attachments

DavidS877

Giant Meteor Goes to Washington

Slav Power

inception_state

inception_state

Lucas Silva #4903141

Rembrandt

I type with my moustache

Slurred

Fae Supremacist

The Voicemail Devil

Fae Supremacist

NSJAP

Nationalsozialistische Juwel Arbeiterpartei

DeadwastePrime

pronouns in bio

Rungle

I dream of Carcosa

Egregore

>tfw people stop believing in you

PandaChai

🧹 fuck jannies & fuck juice 🧃

Bâtard Consanguine

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!