So what I noticed with Flux is that simple language seems to be actually somewhat determinal in an indirect way, if you repeat yourself a lot (which will increase the length of the instruction) this will lead to confusion which expresses itself very similar how it expresses itself in "normal" LLMs (ignoring parts of the instruction), also distorted, "confused" images like in SD with repetition. I would not discourage repetition altogether though, sometimes it does help to "drive a point home". Generally it seems to be better to restructure the prompt, though.
An interesting thing to attempt is to let an AI rewrite the prompt. I discounted automatic prompting with SD largely because in my experimenting, it simply did not lead to good results if you didn't feed the AI all the right keywords, and at that point, you might as well write the prompt yourself. It seems to work well with Flux though. If you consider that the images were probably categorized by AI in a conversational way as theorized in this thread earlier, it makes sense that another AI would find the "right language" (perhaps GPTisms?) to get exactly what you asked for.
Prompting Flux is very different from prompting SD and all the other models that came out (and perhaps MJ and Dall-E, never used those) and for optimal results instead of looking for the right keywords (which often, simply do not exist), it makes much more sense to just describe what you want. I know this has already been said on this very page, just for the sake of completeness.
For example:

(video game adverts soon be like)
This is a comic style artwork of a generic fantasy harpy. When using the word "Harpy", "D&D Harpy" etc. flux usually generated a harpy eagle or some confused mashup of a bird and a woman, a woman holding a bird etc.. sometimes some anime abomination which seems to be the Flux fallback way (SDs fallback way was 00s style 3D renders). D&D style humanoid monsters are sort of my personal benchmark because most models really struggle with them, especially if they are half- something. So I just described the character, down to having a four-fingered claw like hand. The word harpy or even bird was not used once. After writing my verbose prompt I had it summarized by an LLM and it seemed to bring the perplexity down, I felt there were a lot less "failed" generations than with my handwritten prompt. Might be placebo, might be my ESL - I did not test it nearly long enough - but interesting if true.


(broken off arm prompt! worked only in like 1-2 of 10 times though)

I think this is what the kids call having an aesthetic. (I love that this thing can actually do dark scenes) in stable diffusion models, some things are *always* lighted from the same direction. This leads to some scenes looking so unnatural. If you see it once, you cannot unsee it.
This concludes my post. Thanks for reading my blog!