Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

Susanna · Aug 11, 2024

AmpleApricots said:
That's not super unsurprising. In the way these things work, "joe" and "Joe" might be conceptually different to the model, in ways where the relation between the two isn't even all that strong. The "deeper" a model is, the more it will understand such relations, e.g. 4o (or to give a more fitting example, Dall-E 3) will always understand that joe biden and Joe Biden are the same thing, but smaller, shallower models like this can be relatively easily tripped up by this. They can feel extremely literal because of that and I would always pay close attention to the language I use.

I think this is actually a mistake they made while training it. "Convert the training dataset into lower case so the model will be case insensitive" seems like an easy step to miss (it would have taken a few minutes to run, which is nothing). With SD I think they used human-labelled images for the dataset, which means most likely a dataset-maker like Google were turning everything into lower case (ie a photo depicting a bouquet of roses in a vase on a table would be tagged "photograph, rose, flowers, vase, table, red, green"), which is why SD worked best on word salad prompts and struggled with composition. This time they seem to have just scraped a tonne of images with no fucks given about getting permission first (which I absolutely love, screw artists and copyright holders who try to restrict our creativity) and had an AI describe them, which would have produced results like "Sure, I can help with that! This is a photograph of a bouquet of roses in a vase, arranged tastefully on a table". Give it an SD-style prompt and it will struggle, but with a more descriptive prompt it will have far fewer issues with composition, which is also why it's so much better at inserting text (text being just a composition of characters). SD never had issues with actual shapes of letters, it just couldn't parse sentences properly, and with its poor understanding of composition it may just not have made the connection that words are letters in sequence rather than individual glyphs.

Overly Serious · Aug 11, 2024

Yeah, I agree with with @Susanna about it being a mistake and further evidence for that would be that I think they used token replacement to suppress known people. So say you have a dozen images labelled "brad pitt" in your training data, it will obviously learn who Brad Pitt is. But having done all the training you replace the token name of "brad pitt" with "afe03da79a" or whatever. Now you have the benefit of the model having been trained but people can't just type in "brad pitt" and have the model show his image. It's actually a pretty educated guess that what might have happened is they did their substitutions and missed doing upper case substitution. Probably for reasons Susanna says - they weren't used to needing to.

XL xQgg?QcQCaTYDMjqoDnYpG · Aug 11, 2024

I endorse case-sensitive prompts
People who don't put effort into their orthography should be disadvantaged in every area of life

AmpleApricots · Aug 11, 2024

XL xQgg?QcQCaTYDMjqoDnYpG said:
who don't put effort into their orthography

I'm guilty of this sometimes, and even higher end LLMs really suffer from it.

SD (1.5) is a tiny model with a screwy dataset and it's impressive it performs as well as it does tbh.

Well, I theorized about a model I barely used, so that's what I get. These explanations make sense though. Now that there's a control net floating around (from what I saw) I'll play around with it some more.

CansOfCant · Aug 11, 2024

Hit me with the clocks if need be, but I tried searching the thread.
There are Sneed LoRA's

SD1.5: https://civitai.com/models/208210/sneeds-feed-and-seed-or-the-simpsons
Pony: https://civitai.com/models/619920/sneed-pony-by-edg

Baraadmirer · Aug 11, 2024

So for people who are running Flux locally, how long does it take to generate images on your rigs? I've got 16 GB worth of VRAM and I seem to recall generating one image in SDXL taking one minute.

biggacracka · Aug 11, 2024

Baraadmirer said:
So for people who are running Flux locally, how long does it take to generate images on your rigs? I've got 16 GB worth of VRAM and I seem to recall generating one image in SDXL taking one minute.

What card you got?

Baraadmirer · Aug 11, 2024

biggacracka said:
What card you got?

An NVIDIA RTX 3070.

macrodegenerate · Aug 11, 2024

Baraadmirer said:
An NVIDIA RTX 3070.

RTX3070s have 8GB do you mean a RTX4070?

Baraadmirer · Aug 11, 2024

macrodegenerate said:
RTX3070s have 8GB do you mean a RTX4070?

I'm a retard because I was including shared memory. 8 GB

verymuchawful · Aug 11, 2024

For people looking for more optimized Flux inference and/or an escape from ComfyUI, https://github.com/lllyasviel/stable-diffusion-webui-forge has been updated to support Flux and offers "nf4" precision. Smaller than FP8, faster than FP8, minimum quality fall off and can apparently perform better than FP8. https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/blob/main/flux1-dev-bnb-nf4.safetensors for Dev weights already converted to nf4, or you can just check the nf4 precision type and have any checkpoint use nf4 on the fly. (It'll take a minute or two to convert when loading.) With nf4, went from 3-5s per iteration to 1.5 seconds per iteration.

https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/981 For more info, be aware Schnell checkpoints are also supported.

I can comfortably gen on extremely low end hardware with https://huggingface.co/drbaph/FLUX.1-schnell-dev-merged-fp8-4step (it's a block merge that takes the low step convergence from Schnell while retaining the text ability and general quality of Dev) ComfyUI was much slower given my limited hardware.

Overly Serious · Aug 12, 2024

@verymuchawful Interesting info and may give it a go. But using what you wrote just as a launching off point and not as any kind of argument, I kind of feel that the days of running everything locally on a consumer GPU might be on their way out. Not completely, never completely. But for serious use I think the future is renting compute in the cloud. One of the things that has made Flux so good is that the creators decided not to overly constrain themselves to current domestic GPU and said: "Yeah, lets use 24GB VRAM".

Maybe I'm wrong - haven't even tried the Schnell model yet, I've been doing everything on Runpod. But I see posts on Reddit where people are happy about getting it to run in 8GB of VRAM or something and my instinct is to think there's no way that can be comparable.

Unless gaming ends up drastically increasing the amount of VRAM games need - and I can't see that happening as VRAM is already outpacing the GPU's ability to use it - then I feel like AI piggy-backing on consumer GPUs is going to come to an end. Even in terms of cost, I'm renting an Nvidia A40 with 48GB for approx $0.47 per hour. A 4090 with half that RAM costs me about $2,300 which is equivalent to nearly 5,000hrs of full usage. That's 200_ days. Aside from any electricity costs I'd have, though mild balance would be I haven't included cloud storage though that's cheaper.

verymuchawful · Aug 12, 2024

Overly Serious said:
But I see posts on Reddit where people are happy about getting it to run in 8GB of VRAM or something and my instinct is to think there's no way that can be comparable.

To give an idea performance wise. FP8 on a 10GB 3080 is about 3s per iteration. 10-15 seconds for a 4 step generation on the merge linked. On NF4 the speed on that same GPU is 1.5s per iteration, making 20 step gens on Dev only take 40-50 seconds. (Friend's system that has a lot of GPU downtime when they are busy with other stuff.) On my super low end system that only has 16GB RAM and 4GB VRAM and is AMD, I can get 55-85s per iterations. Making 4 step gens on Schnell/Schnell merges take about 5 minutes. And that's on FP8 because that card doesn't support NF4. (For Flux anyway, I can run SDXL checkpoints in NF4 on the AMD card for some reason.) Not being able to fit the entire model into VRAM really isn't that detrimental. Unless of course you want to train the model or even train a lora, you're gonna need 24GB minimum unless they can cut down lora training to NF4 precision as well.

Above is 4 steps on the Dev-Schnell block merge. Schnell on it's own would struggle with the text way more.

macrodegenerate · Aug 12, 2024

The new LORAs are starting to drop for Flux which is pretty cool

Overly Serious · Aug 12, 2024

verymuchawful said:
To give an idea performance wise. FP8 on a 10GB 3080 is about 3s per iteration. 10-15 seconds for a 4 step generation on the merge linked. On NF4 the speed on that same GPU is 1.5s per iteration, making 20 step gens on Dev only take 40-50 seconds. (Friend's system that has a lot of GPU downtime when they are busy with other stuff.) On my super low end system that only has 16GB RAM and 4GB VRAM and is AMD, I can get 55-85s per iterations. Making 4 step gens on Schnell/Schnell merges take about 5 minutes. And that's on FP8 because that card doesn't support NF4. (For Flux anyway, I can run SDXL checkpoints in NF4 on the AMD card for some reason.) Not being able to fit the entire model into VRAM really isn't that detrimental. Unless of course you want to train the model or even train a lora, you're gonna need 24GB minimum unless they can cut down lora training to NF4 precision as well.
View attachment 6299210
Above is 4 steps on the Dev-Schnell block merge. Schnell on it's own would struggle with the text way more.

That's pretty impressive. Well I welcome happy surprises. I'll likely give it a go locally a little later. Cheers!

AmpleApricots · Aug 12, 2024

The chinese are soldering 48 and 32 GB of VRAM to their 4080D and 4090 Super, "don't underestimate the chinese", exhibit #415144.

You can do this at home if you are adventurous and skilled, there are also cheap $200 chinese BGA stations now. I'm surprised the nvidia firmware doesn't brick the card. It would seem like a very nvidia thing to do.

Overly Serious said:
But for serious use I think the future is renting compute in the cloud

I agree with this though. People have a mental block for paying for such things online but it's actually not that expensive. Some of the LLM rigs some people build are so expensive to put together (=costs of the parts) and run (=cost of electricity) that you'd take forever to break even with just renting some server time, and most of the rigs I've seen perform strictly worse than those cloud servers. Of course it's better to not be dependant on some cloud but currently I feel it's just not practical. That might change if we get dedicated AI hardware.

Overly Serious · Aug 13, 2024

@verymuchawful Well you were right about what was possible. I was surprised at how able to run Flux locally I was.

Flux Dev. fp16, 1024x1024, 20 steps } 62 seconds, 2.85s/it
Flux Dev, fp8, 1024x1024, 20 steps } 64 seconds, 2.88s/it

(No, I have no idea why the fp8 took longer than fp16. It's not due to model loading, this was consistent across runs).

Flux Dev, fp8, 512x512, 6 steps } 12 seconds, 1.08s/it.

I tried out the slightly cut down Comfy recommended one and it made no difference to times so far as I could tell nor how maxed out my VRAM was (I have 20GB). I also tried out Schnell and it gave me better output. I think something was giving out with Dev on my hardware as I would sometimes get blurred images. (No, it wasn't anything NSFW). And very, very weirdly it would seemingly hold onto elements from a previous run. Example, I ask for a drawing of a person with various details. I then change it to "photo of" and add "detailed, realistic", and it still gave me drawings. Swap to a different model and back and now it gives me realistic photos. I have no explanation for that at all. It shouldn't be possible but appeared to be the case.

The bulk of time for a generation was loading the model which it seemed to need to do every time, I guess perhaps VRAM was so tight that it freed it up the moment a run was over. I didn't try any of the ones you pointed at yet. And tbc, my view on how things are going long-term is the same just to be clear. However, was surprised I could run this (more or less) on my hardware.

whatever I feel like · Aug 13, 2024

I think I've pushed Flex to its limits.... it can't draw an accurate Jar Jar Binks to save its life.

The Tall Man · Aug 13, 2024

Can someone who played around with Flux tell me if it can copy artists style and does it know specific people?
Could it create a drawing of George Floyd punching Elizabeth Olsen in the stomach in the style of Todd McFarlane?

whatever I feel like · Aug 13, 2024

The Tall Man said:
Can someone who played around with Flux tell me if it can copy artists style and does it know specific people?
Could it create a drawing of George Floyd punching Elizabeth Olsen in the stomach in the style of Todd McFarlane?

I just typed it in and it gave me a white man punching a blond woman in the style of an American comic book. So, no.

Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

Susanna

Ruin is inevitable, and all else is prelude

Overly Serious

XL xQgg?QcQCaTYDMjqoDnYpG

lrhhtf oo uTinfiars oEs dto og

AmpleApricots

CansOfCant

Kefka Palazzo did nothing wrong

Baraadmirer

💪🍦💪

biggacracka

Baraadmirer

💪🍦💪

macrodegenerate

Generative AI was a mistake

Baraadmirer

💪🍦💪

verymuchawful

Enjoy prison, sticker child.

Overly Serious

verymuchawful

Enjoy prison, sticker child.

macrodegenerate

Generative AI was a mistake

Overly Serious

AmpleApricots

Overly Serious

whatever I feel like

Mushroom Kingdom Uber Alles!

The Tall Man

whatever I feel like

Mushroom Kingdom Uber Alles!