Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
I'm using the web version of Flux and I'm having trouble generating exactly what I want. Any pointers?

Prompt:

Night time moody lighting, a woman in all black, eyes blindfolded with a black scarf, using both hands holding an ancient broadsword pointing up, with both hands on the hilt holding it up, surrounded by darkness, there is no forest, there is no beach, she is skipping atop water in a pond with waves dispersing at each graceful ballerina step from left to right, the water is clear and beautiful, atmospheric dark shot, camera is looking down at her.

Result:

View attachment 6264342


I can't get her to hold the sword exactly as described and the background to be nothingness. Any advice for lighting would be helpful. Thanks.
So it's not convenient for me to fire up runpod right now but I can offer some suggestions.

The first is one you probably don't want to here but if that's the level of control you require, you need to take the next step in your learning and try out either image to image or control net. You'll find image to image the simplest as you are just taking blank latent image and using an actual image instead and then experimenting with the steps and denoising to find a good balance between blitzing everything and changing nothing. I know that ComfyUI looks more complex but I actually find it simpler because you can see and understand the steps better. There's an example work flow in their examples folder which is actually just a PNG you drag onto the UI and it will replicate the image to image workflow. Image to image can let you tweak things or modify an image so if you had say drawn what you wanted you could use AI to make it more impressive. There's also in-painting which can change small parts of the pic such as adding a blindfold. That's something of an art but it means you could create your image in stages, adding elements as you go. But it's tricky and honestly what you probably want for something like this is Control Net (which isn't available for Flux yet but they say is imminent). Controlnet will let you fine-control poses (amongst other things):
1722684668033.png

But enough of the accurate and tough answers. There are things that you want to hear as well which is just things you can change about your prompt.

Night time moody lighting, a woman in all black, eyes blindfolded with a black scarf, using both hands holding an ancient broadsword pointing up, with both hands on the hilt holding it up, surrounded by darkness, there is no forest, there is no beach, she is skipping atop water in a pond with waves dispersing at each graceful ballerina step from left to right, the water is clear and beautiful, atmospheric dark shot, camera is looking down at her.

Instead of "a woman in all black" be specific: "woman in a black dress" If you need to add things like "black ballerina shoes" then add them and re-run as you go. Try to keep prompts simple at first and maybe add detail later. So instead of "using both hands holding an ancient broadsword pointing up, with both hands on the hilt holding it up" with its multiple holdings and "using", just try "holding a sword above her head with both hands"

Part of your problem is slightly conflicting requirements. Example: blank background, standing on a clear and beautiful pond, camera looking down at her, camera... Lets put these together. If you're looking down at her then the background must show the water. But you've also told it to have a blank background. So I am inferring that you want something that is more like an illustration - like a drawing of all this on an otherwise white canvas. I don't know if I'm right about what you want but it you did want a photograph then "blank background" is going to be a little odd for something that includes nature elements. So if you want more of a drawing, lets add "drawing of..." at the start here. And we should also consider what you mean by "night time". Ordinarily that might mean all sorts of things - if the background is an alley then it might mean street lighting, if it's rural it might mean stars and moon in the sky. But with "blank background" it can mean none of that. So you're trying to set the model against itself to a degree. Same with clear water. Water isn't clear at night - it's either a black void to look upon or it's a mess of reflections. But it's not something you look down through. So what exactly are you expecting. If it goes with "clear and beautiful" water which it probably will, it'll give you something that looks like daytime in the water, at least. So to help you, I would suggest you think about what you want specifically and then request that directly so there's less room for interpretation. "dark green water", "reflections of stars", whatever.

I'm fairly certain this image isn't what you want but it might provide you a starting point that you can begin adding things to. Note I went with the idea that you wanted a drawing due to the blank background requirement and that you didn't specify anything like "highly detailed, highly realistic, photograph" in your prompt.

So this prompt got me the image below: "drawing of a woman in black dress wearing black blindfold holding a sword above her head with both hands. She is walking on water, blank background, viewed from above, sword pointing upwards, night, atmospheric"
walking.png

For comparison I did the same prompt as a photo so you could see what I mean about "moody" and "nighttime".
walking2.png
(Prompt was: "photograph of a woman in black dress wearing black blindfold holding a sword above her head with both hands. She is walking on water, blank background, viewed from above, sword pointing upwards, night, atmospheric")

As you can see, it's not really sure how to make it nighttime or atmospheric with nothing to work with to do so. You will have to explain how you want it to be night or moody to help it out if you want a blank background. Like if you want a dark background, tell it that.

Unfortunately you're probably just using a web API in which case it may not support re-using the same seed. Depending on your programming knowledge, I would look at using the API programatically. Failing that, run it locally. (Unless I've misunderstood and you already are). In which case you can used a Fixed seed and see the impact of your prompt changes only without other random changes.

So I hope that was on some help. The real answer is ControlNet. But there's still some changes you can make to your prompt to increase your chances.

Moderately long text, anyways.
Good Lord - the AI is an admirer of Robbie Burns!
 
Oh man, I really want to try running FLUX locally. Unfortunately, my options are currently to either A) Wait for A1111 support or B) Install ComfyUI.
 
Oh man, I really want to try running FLUX locally. Unfortunately, my options are currently to either A) Wait for A1111 support or B) Install ComfyUI.
Just use comfy, its really not very difficult and you can just stick to a premade workflow. I switched to comfy before A1111 had SDXL support and I've been using it ever since
 
  • Agree
Reactions: Overly Serious
Yeah, Comfy looked more complex at first and I only tried it because it was giving better performance at the time. But having gotten used to it I find it simpler than A1111. Hard to imagine I'm sure but with Comfy's nodes I can understand how the process all works, easily tweak things as I go. And being able to create a workflow by just dragging a PNG onto the UI and dropping it there is a massive time saver so it's not as complex to create workflows as it looks.

Kiwifarms doesn't strip the image metadata (like Reddit does) so all those Flux images I posted in my big post the other day, you should be able to just copy and paste them into the Comfy UI and have everything I did right down to the same seed, just suddenly appear.

What sort of hardware do you have? My pics were from Flux-dev with the fp16 clipper. That's going to be demanding. I've actually been using Runpod for this so far though will be testing locally next.
 
Just use comfy, its really not very difficult and you can just stick to a premade workflow. I switched to comfy before A1111 had SDXL support and I've been using it ever since
Yeah, Comfy looked more complex at first and I only tried it because it was giving better performance at the time. But having gotten used to it I find it simpler than A1111. Hard to imagine I'm sure but with Comfy's nodes I can understand how the process all works, easily tweak things as I go. And being able to create a workflow by just dragging a PNG onto the UI and dropping it there is a massive time saver so it's not as complex to create workflows as it looks.

Kiwifarms doesn't strip the image metadata (like Reddit does) so all those Flux images I posted in my big post the other day, you should be able to just copy and paste them into the Comfy UI and have everything I did right down to the same seed, just suddenly appear.

What sort of hardware do you have? My pics were from Flux-dev with the fp16 clipper. That's going to be demanding. I've actually been using Runpod for this so far though will be testing locally next.
I've gotten so used to A1111 that the thought of having to relearn stuff was a bit off-putting, but I guess giving it a try at some point is worth a shot. As for the hardware, my current main PC should probably be okay with 24GB VRAM and 64GB RAM.
 
  • Like
Reactions: Overly Serious
Oh... ohhhhhhhh!!!!

holy SHIT.png

For 12.5 seconds on a first overly-worded autistic test prompt, this Flux thingie is not too shabby.

That's way faster than cramming SD1.5 through barely sufficient GPU locally, better definition, more flexible size too. If I used a proper prompt, hmm...

anime girl looking determined riding a motorbike over a collapsing bridge while a train explodes in the background, she has golden-brown eyes and long brown ponytail hair with a pair of feathers atop her head, she dresses in a brown cloak secured with a ribbon, she wears gloves that are fingerless on her index and pinky fingers, she has a belt wrapped around her waist with a lantern, pouch, and small dagger, a pleated skirt, leggings with her right leg reaching above her thighs and the left leg below her thighs, and wears brown boots
 
Took a break and just started messing around with stable again. Tried out comfy ui, after you get past the conspiracy board like set of strings, its so much faster then automatic, like crazy faster.

Trying to get comfyui-manager installed, but can't seem to figure it out if you are running in a docker though. Anyone?
 
Last edited:
  • Like
Reactions: Overly Serious

I found a beginner video to get started training stable diffusion 1.5 with comfyui in 5 minutes if this will help anyone.

And I can use some help after training a model. Lets say I want to train a model to generate a character, but how do you have the character do different poses? Also after I train the model it looks like theres bad anatomy and glitch eyes even though i put them in negative prompts. anyone got advice.
 
Mildly interesting, unlike SD, the Flux model is case sensitive. At least in some cases. A user on Reddit generated the prompts:
"a photograph of joe biden riding a giraffe in the mountains"
"a photograph of Joe Biden riding a giraffe in the mountains"

It recognised who Joe Biden was only when his name was capitalised. Photos below. Personally I approve of AI models having proper expectations of grammar from their users but posting both because it's curiously different to other models and as a heads up to people using Flux.

Uncapitalised:
1723044213104.png
Capitalised:
1723044232153.png
 
Mildly interesting, unlike SD, the Flux model is case sensitive. At least in some cases. A user on Reddit generated the prompts:
"a photograph of joe biden riding a giraffe in the mountains"
"a photograph of Joe Biden riding a giraffe in the mountains"

It recognised who Joe Biden was only when his name was capitalised. Photos below. Personally I approve of AI models having proper expectations of grammar from their users but posting both because it's curiously different to other models and as a heads up to people using Flux.

Uncapitalised:
View attachment 6282639
Capitalised:
View attachment 6282640
That first one is a spitting image of the "Mayhem" auto insurance commercial guy. I can only imagine what chaos he's going to cause with that giraffe.
 
  • Like
Reactions: anustart76
That first one is a spitting image of the "Mayhem" auto insurance commercial guy. I can only imagine what chaos he's going to cause with that giraffe.
He's escaping his nurses and riding back to DC to tell everyone he never wrote that letter and he's still the candidate. Just out of view are a bunch of Dem staffers on rhinos armed with sedatives.
 
  • Like
Reactions: Catgirl Tyranid
i want to get into comfyui (since when forge died, best a111-based option, i'll miss you) but i have no idea where to start.
 
My guess is that Flux was trained with AI captioned images, because it works better with full sentences and paragraphs. I've found you can make really interesting things easily by making GPT describe an image you like and then changing some of it to your liking. For example, I got this batch by having it describe a picture of a puppeteer, then moved around some details. Having GPT write the prompt usually comes up with something that Flux can work with.

the_truth.pngclinton_news_net.pngpuppets.png

It knows so many logos, too, they clearly used copyrighted material and they just didn't give a shit. This model has a lot of meme potential right. Using it reminds me a lot of the wild stuff people would make with Dall-e mini.

i want to get into comfyui (since when forge died, best a111-based option, i'll miss you) but i have no idea where to start.
Download comfyui, portable or installed. Install the comfyui manager to make it easier to download nodes and models. Try the examples on this page (Hires fix is probably the most basic example on that page). If the workflow crashes, just open the manager and pick "install missing custom nodes" or "install missing models."
 
i want to get into comfyui (since when forge died, best a111-based option, i'll miss you) but i have no idea where to start.
@Kaiba Cuqk63 has pretty much covered it but if you've been able to install and run A1111 then you shouldn't have too much trouble with ComfyUI. Clone the repo, follow any specific instructions appropriate to your OS and if you have an AMD GPU, install any Python requirements you might not have using pip and run it from the commandline and visit the URL. There are folders to save the various models, LoRAs in etc. and if it wasn't clear, you can build a workflow in ComfyUI just by dragging and dropping the output image from one of the examples in the link Kaiba pointed you at. Or any output, usually. All PNG images made by ComfyUI have the metadata of their workflow embedded in them so you can just grab an image someone has made and have the workflow, the seed, the settings, all just appear in the UI ready to run. Well, assuming you have any custom nodes it requires installed which as Kaiba says, you can install automatically with the Manager tool which appears in the UI itself.

Except for images from Reddit because Reddit stupidly strips out the needed metadata from any posted images because Reddit is stupid.

Separately, neither of the below are mine but I found both interesting. The first shows just how much more advanced the text insertion in Flux is. The second was a combination of Flux for the base images and then run through LunarAI using their end-frame functionality to make an animated image. Shockingly advanced.

dog_dead.png


Also, someone has successfully trained a LoRA with Flux so the floodgates are probably now open. Finetunes when?
 
the Flux model is case sensitive
That's not super unsurprising. In the way these things work, "joe" and "Joe" might be conceptually different to the model, in ways where the relation between the two isn't even all that strong. The "deeper" a model is, the more it will understand such relations, e.g. 4o (or to give a more fitting example, Dall-E 3) will always understand that joe biden and Joe Biden are the same thing, but smaller, shallower models like this can be relatively easily tripped up by this. They can feel extremely literal because of that and I would always pay close attention to the language I use.
 
  • Like
Reactions: Vecr
That's not super unsurprising. In the way these things work, "joe" and "Joe" might be conceptually different to the model, in ways where the relation between the two isn't even all that strong. The "deeper" a model is, the more it will understand such relations, e.g. 4o (or to give a more fitting example, Dall-E 3) will always understand that joe biden and Joe Biden are the same thing, but smaller, shallower models like this can be relatively easily tripped up by this. They can feel extremely literal because of that and I would always pay close attention to the language I use.
Eh, I mean none of the stable diffusion models were so I do think it's a little surprising.
 
Back