Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🔧 At about Midnight EST I am going to completely fuck up the site trying to fix something.
I think I caught Bing's covert diversity tags
1.jpg2.jpg
 

Attachments

  • 1f.jpg
    1f.jpg
    602.7 KB · Views: 24
  • 2f.jpg
    2f.jpg
    527.1 KB · Views: 26
Yes, because you know AI was used.

Again, this is literally the point I'm making! Hypothetically, if those artists used AI and nobody found out, it wouldn't have mattered to you. The intent was still there, but because people are currently devaluing the tool and process used it magically becomes worthless. You don't care, you care that other people disapprove of the methodology.
??? I mean if you are all about results and not process then sure. If someone put sawdust in my burger and I didn't notice then it's all good. Yum yum. If someone stole my money and I didn't notice then it's all good. Cha ching. I will consume AI art and be happy. Yay. It's okay, I know I won't be able to convince you and vice versa.
 
Again, this is literally the point I'm making! Hypothetically, if those artists used AI and nobody found out, it wouldn't have mattered to you. The intent was still there, but because people are currently devaluing the tool and process used it magically becomes worthless. You don't care, you care that other people disapprove of the methodology.
or more like because AI generated images are more representative of statistical patterns than any form of artistic direction. it's noise that represents an image. it can't by any definition be art.
 
I saw some workflow somewhere where somebody made pixel art animations with SD, and it looked pretty cool.

The biggest problem with AI generated (and this will also be a problem with Sora, but don't tell the creative types, them freaking out is p. funny) that you can't control *well* what you get. I mean, you write a prompt, and you kinda get what you prompted for but, well not really! Then you write the same prompt three times and it hits the subject matter three times but the results will still vary wildly. For one-offs, that's fine but it's hard to do any consistent workflow with it, e.g. if you try to do art for your indie game. If you try to make a short movie with sora, it'll look like this:


GPT4 suffers from the same problem. A good real-world use for LLMs would be interfacing them with conventional programming via APIs. While many problems people somehow trying to solve with LLMs now (most bizarre one that comes up again and again is math) are much easier solved with conventional programming, there are interesting edge cases where an LLM and it's understanding of natural language and reasoning is something you can't really replicate with programming that easy or well. They consistently try to improve on GPT4 not giving you garbage output when you ask for formatted and structured output, but it still fails often enough. The smaller models are even more hopeless at this.

Open Source LLM land has solved this problem a while ago with GBNF Grammar. It's basically kind of like a scripting language you can apply to force the LLM output into a specific format. For example, it would be possible to write a grammar formatter with this that only lets the LLM output valid C code. With this applied, the LLM literally can't do anything else. (It's not flawless because if you give it work that can't be solved with the formatting at hand, it'll sometimes try to bypass it in hilarious ways will still technically adhering to the "script")

Same with SD. SD and even SC might be worse than Dall-E 3, but you have a lot more control over it's outputs. Control the likes of OpenAI never are planning to give you because it would be "unsafe". So while all the tech they output is really interesting, I really start to wonder what it could possibly be good for in it's current shape, especially since the safety problem is not gonna go away.
 
Open Source LLM land has solved this problem a while ago with GBNF Grammar. It's basically kind of like a scripting language you can apply to force the LLM output into a specific format. For example, it would be possible to write a grammar formatter with this that only lets the LLM output valid C code. With this applied, the LLM literally can't do anything else. (It's not flawless because if you give it work that can't be solved with the formatting at hand, it'll sometimes try to bypass it in hilarious ways will still technically adhering to the "script")
What LLMs like GPT excel at is taking unstructured data and converting it into structured data. This is why they are very capable of programming tasks. Taking something like a natural language text prompt that outlines the tables in a database and converting it into a SQL query or taking something like a SQL query and converting it into JSON or some other format. This is a real value in LLMs both application wise and commercially.

But text as a medium can only get you so far when it comes to tasks that involve image or video generation. Which is why I don't think that text to image or text to video is as useful as people believe. Something like ControlNET or img2img will be the actual value of generative models for creative workflows.
 
  • Agree
Reactions: AmpleApricots
Something like ControlNET or img2img will be the actual value of generative models for creative workflows.
It already is. All my best prompts have been img2img, using pretty basic sketches from my tablet. Two minutes drawing out a quick sketch of what I want first means I can go from an hour of tinkering with a prompt before getting a good result, to ten minutes. And if a prompt is close but not quite there I can just pull the image into PDN to add whatever feature or expression or whatever I want and then have SD make the edit fit in seamlessly.
 
The biggest problem with AI generated (and this will also be a problem with Sora, but don't tell the creative types, them freaking out is p. funny) that you can't control *well* what you get. I mean, you write a prompt, and you kinda get what you prompted for but, well not really! Then you write the same prompt three times and it hits the subject matter three times but the results will still vary wildly.

It really can feel like a slot machine at times. Just one more pull and I will get my perfect waifu artistic masterpiece. Give it couple years before we start hearing about AI prompting addiction in the same way we hear about gambling addiction.
 
Last edited:
ComfyUI implemented native support for Stable Cascade and I gave it a go with my 1060.
ComfyUI_00002_.pngComfyUI_00004_.png
The good news that it works. The bad news, barely.

It takes minutes to put out a single image and the 6GB of VRAM is just not enough so it has to rely on regular RAM, causing havoc because I only have 16GB of RAM with half of it already being used by other software.

This obviously throws me off from experimenting since waiting a minute to make a single image with SD was already bad enough, having to wait another minute to see a single tweak to take effect. That GPU upgrade is inevitable when you want to keep engaging with this tech.

It's very impressive, it's amazing that I can self run something that is reaching DALL-E 3 levels of detail and understanding, and it's exciting that this is a StabilityAI project, so people will train more purpose specific models, LoRA's and more, completely unrestricted by prompt filters, prompt poisoning and other bullshit the big players do with their offerings.
 
Please don't tell me those are all AI generated.

The biggest problem with AI generated (and this will also be a problem with Sora, but don't tell the creative types, them freaking out is p. funny) that you can't control *well* what you get.
Specific subset of that, you can't get consistent characters. I've followed some YouTube vids on how to achieve it but they were very limited. If you generate an image of someone it's very hard to consistently get that same someone in other poses or scenes. Which brings you back to needing an actual artist.
 
Funny you say it when Stable Cascade has features that allow you to inpaint/outpaint consistent features or even train models for it to remember for later use in prompts. Were those mentioned on Youtube yet?


View attachment 5743588
View attachment 5743589
I'm going to stick to my guns a little on this one. I have seen both of these approaches. I think what I said was that it is very limited. Your first example is especially what I was talking about - they're all similar portrait styles, forward facing similar position. It's essentially in-painting which is what the majority of the YouTube videos and how-tos for consistent characters work. It doesn't let you say "this character side on" or "this character in this style". Some of it gets a little close to being able to say "this face but angry". With mediocre results. The second one about training your own lora is more what I'm talking about. It's not there yet but it's proceeding in the right direction for what I mean. It's quite a bit of work, though. Unless you want to use a particular celebrity's face where there are loras out there.
 
  • Informative
Reactions: Rembrandt
Back