Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
The Machine God
You should've bought more...
 
Since ComfyUI technically has an official logo now, I've made a Windows icon for shortcuts, as well as favicons for comfyui-custom-scripts to make it feel a bit more like a proper piece of standalone software.
1742679185054.png
You can also kinda make it a standalone program if your web browser allows you to turn any website into a Progressive Web App. Hopefully it'll become truly standalone in the near future, without having to do this ugly dichotomy of a command line and a browser tab to use it.
 

Attachments

Hopefully it'll become truly standalone in the near future, without having to do this ugly dichotomy of a command line and a browser tab to use it.
I like the distributed model. I can start it on any one of my systems then control it from my desktop. I also wrote a little script that watches the queue on one node and ships jobs to other nodes when they are idle.
 
I like the distributed model. I can start it on any one of my systems then control it from my desktop. I also wrote a little script that watches the queue on one node and ships jobs to other nodes when they are idle.
Turns out that there are already works on a desktop version with it's own Electron framework. I tried migrating to it but it's a goddamn mess, so for the time being a good alternative for a clean experience for me is just running ComfyUI as a service via NSSM and using the PWA functionality of my browser. I have so much RAM and VRAM to spare it won't interfere with my day-to-day use.
 
Any info about whether or not it's possible to run it on consumer GPU's? They recommend 80GB GPU's which only exist in the enterprise hardware sphere, or the home lunatic sphere where you NVLink four 3090's together.
You can run Hunyuan on as low as 8-12gb with specific workflows, it seems like you need at least 24 for the img2video model at the moment but I expect that to go down as people do their magic

I've got the 14b wan2.1 model running on my 4090 with a kind of janky quantized setup similar to: https://rentry.org/wan21kjguide/ - it seems to peak around 20 GB of VRAM.

I'd definitely recommend updating to torch nightly as well, it sped things up by maybe 20%. eg.
"pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126 --force-reinstall"

If it breaks horribly just reinstall the old version from comfy's requirements.txt.

It takes about 15 minutes to do image to video and generate a five second clip at 720p. 480p is more like 2-3 minutes.
 
Last edited:
  • Informative
Reactions: Egregore
What is being used for the studio ghibli style images that I am seeing on twitter today? Some are saying it's GPT 4.0 + a source image. Others are saying this is being blocked already.
 
Grok AI from twitter apparently. Here is a main thread https://kiwifarms.st/threads/the-studio-ghibli-meme-gallery.215562/
Grok might be used for a couple of these, but the majority are made with ChatGPT's new 4o image generation.

Here's a comparison. Given this input image:
1665_Girl_with_a_Pearl_Earring.jpg
and the prompt "Generate a version of this in Studio Ghibli anime style", Grok generates this:
image (1).jpg

Meanwhile, with the same image and prompt, 4o generates this:
ChatGPT Image.png

Both Grok and ChatGPT use a different image generation approach from the diffusion models that have been popular for the last few years. They're autoregressive models, which was actually the approach that DALL-E 1 used. As far as I understand it, this means that the model draws the image in batches of pixels, moving from the top left to the bottom right, rather than denoising an entire blurry picture at once in a series of steps like diffusion models do. You can see this with ChatGPT, as the image generation takes a long time and fills in from top to bottom.

Style transfer like this has long been technically possible with diffusion models, but to get halfway decent results you need to tinker with different IPAdapter models and ControlNets and get lucky with random numbers.

In comparison, 4o's autoregressive model just lets you upload a picture and say "make it anime."

Unfortunately, as always happens with closed source models, OpenAI is already nerfing it. I got this response when attempting to generate a Ghibli ChrisChan:
I wasn't able to generate that image because the subject appears to be a recognizable real person, and I need to follow guidelines that protect individuals' privacy and identity. If you have another idea or request—like an original character, a concept, or a stylized version of a different scene—I’d be happy to help!
 
Unfortunately, as always happens with closed source models, OpenAI is already nerfing it.
The silver lining is that Chinese firms now have a firm target to beat, 4o native image gen appears to be a multimodal autoregression model (tokenizing each pixel and doing reasoning in pixel space) while FOSS solutions used diffusion and write always ahead of autoregression in terms of resolution and quality.
OAI somehow overcame the shitty quality of autoregression, or maybe it's a hybrid model like Meta had proposed where the token model is used to make a low res scaffold and let diffusion take on from there, nobody can tell now. But since open source diffusion is no longer cutting edge, there's room to improve.

And they're also usually far larger than diffusion models so good luck running those locally, best case scenario a model is open sourced and you get to choose from a selection of providers that run it uncensored.
 
Last edited:
And they're also usually far larger than diffusion models so good luck running those locally, best case scenario a model is open sourced and you get to choose from a selection of providers that run it uncensored.
I had a good experience with fireworks.ai's playground showcase of Deepseek V3 03-24 yesterday. At this point chyna's open model weights can be hosted by API providers cheaply enough that it's free for casual use, almost free for everything else, and still performs pretty close to cutting edge. Fingers crossed that they keep not having to care about pressures to censor.

How many tokens do you think are in a 512x512 image for performance like 4o is showing, by the way? For a ballpark estimate of how much this will cost if API services with the model weights charge by the token.
 
How many tokens do you think are in a 512x512 image for performance like 4o is showing, by the way? For a ballpark estimate of how much this will cost if API services with the model weights charge by the token.
Hard to say because their exact method is unknown. Back in gpt-4o-vision that would just make calls to DALL-E, a 512x512 tile would be about 6k tokens, and that's also a regression model. It could probably be tens of thousands.
 
I've been playing a lot of GURPS games with friends on Tabletop Simulator, and AI slop is perfect for making character images to use as tokens. I made my account here using one from a game we recently concluded, a campaign set during the first crusade. I wanted to go for a Tyrion Lannister archetype, but horribly inbred. It only took a few photoshop alterations to get what I had in mind. It must really suck for people who would normally take commissions for this kind of thing. What would this have even cost, and how long would it have taken to get back?


very-short-obese-45-year-old-midget-wearing-11th-century-french-armor-in-a-medieval-painting-s...png
inbred bastard.jpg
 
  • Like
Reactions: PandaChai
Back