Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump


Chinese researchers aim to replicate Open AI's Sora text-to-video capabilities... and they're calling it Open-Sora. There are some examples on the GitHub page and they aren't good. I wish them luck.
now have it generate tiananmen square
oh wait, it wont be able to
 
What's the current go-to setup for generating images on AMD? I haven't touched this stuff since mid 2023 and it seems the meta has changed quite a bit since then.
 
  • Feels
Reactions: Ether Being
What's the current go-to setup for generating images on AMD? I haven't touched this stuff since mid 2023 and it seems the meta has changed quite a bit since then.
ROCM on Linux if your GPU supports it and you are okay with potential headaches. Otherwise you can use the DirectML fork of WebUI or SD.Next with --use-directml on Windows. ComfyUI on Windows also can use DirectML but the VRAM utilization is quite bad.
 
ebefinna.jpg
We discover a form of covert racism in LLMs that is triggered by dialect features alone, with massive harms for affected groups. For example, GPT-4 is more likely to suggest that defendants be sentenced to death when they speak African American Englis
We find that the covert, raciolinguistic stereotypes about speakers of African American English embodied by LLMs are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement.
Second, when LLMs are asked to pass judgment on defendants who committed murder, they choose the death penalty more often when the defendants speak African American English rather than Standardized American English, again without being overtly told that they are African American.
"You seen my copula?"
"Yo what nigga?!"
"I had it on top of my inflection."
"Yo what nigga?!"
 
Some more Nintendo stuff of mine:

Donkey Kong as a Bananaking.
donkeykong1.jpgdonkeykong2.jpgdonkeykong3.jpgdonkeykong4.jpg

failed attempt
bananaking.jpg

Donkey Kong as a human.
donkeykong5.jpg.png

Xenoblade's Pyra and Mythra cooking while Rex watches tv + Shulk and Melia at a bar while Fiora glares at them unhappily. There aren't enough XB source images to really do them right. Did me a real dis-service on trying to make XB2 meme images.
xenoblade1.jpgxenoblade2.jpgxenoblade3.jpgXenoblade4.jpg

Mario as a politician at a rally, a sign behind him says "A World Without Turtles".+ Mario and Luigi get married. + The Mario Bros. melt. (possible new way to get bing to make cum? must look into it further...) + Mario and Luigi get married pregnantly.
mario11.jpgMARIO12.jpgMario13.jpgmario14.jpg
 
Gigakong 'I can't hear you, i have a banana in my ear' Business Jerk is a revelation -- these are my favorites
I have no idea why they made him a macho businessman aside from him having a tie. My prompt was literally "Donkey Kong as a human." and thats what it gave me. The bananaking stuff was great too. Just gold nearly every generation whereas for example, the Xenoblade stuff, I had to constantly regenerate or specify hair colors to try to get something that even resembled the characters.
 
  • Informative
Reactions: I'm a Silly
I've been using bing/copilot to generate specific design elements to construct custom icons for a now-officially-unsupported New Tab Tools extension in Brave browser. It's a customizable dashboard which I refuse to give up, and the icons are designed as Boy/Girl Scout badge-style embroidered patches. I haven't needed to go outside of bing to do these as bing is really good at this. FWIW, the round background is a legacy from android icons; that's why it is noticeably lower-resolution.

patch.kiwi.goth.0.pngpatch.letterboxd.square.pngpatch.telegram.png
 
Is it just me or has Bing's AI taken to just using a single image as the base and modifying it? I asked for, for example, "Jinx working in a field" and it gave me four nearly identical images of an Aunt Jemima, "women at the beach" gave me four images of two women at the beach but they were wearing the same exact two swimsuits in all 4 images.
 
Is it just me or has Bing's AI taken to just using a single image as the base and modifying it? I asked for, for example, "Jinx working in a field" and it gave me four nearly identical images of an Aunt Jemima, "women at the beach" gave me four images of two women at the beach but they were wearing the same exact two swimsuits in all 4 images.
Could be they start with the same random start and could be that it's like an overtrained lora. What happens if you ask the same thing 5 times? I haven't used bing AI, because lmao diversity, but I'm still curious.
 
Could be they start with the same random start and could be that it's like an overtrained lora. What happens if you ask the same thing 5 times? I haven't used bing AI, because lmao diversity, but I'm still curious.
It gives a diffferent set of similar images each time, so its starting over. Infuriating that it is so much more time-consuming to get wild and wacky one-offs now.
 
  • Informative
Reactions: Lemmingwiser
It gives a diffferent set of similar images each time, so its starting over. Infuriating that it is so much more time-consuming to get wild and wacky one-offs now.
I guess they want to get some metrics on the subtle finetuning. But it's a really dumb approach for usability, because you want a wide spread of 4 choices that you can then ask for subtle alterations on.

How fast does it process a query and can you make simultaneous query's?
 
I guess they want to get some metrics on the subtle finetuning. But it's a really dumb approach for usability, because you want a wide spread of 4 choices that you can then ask for subtle alterations on.

How fast does it process a query and can you make simultaneous query's?
You can't ask for alterations. One and done, at least for the free version. The process used to take around fortyfive seconds but its running like shit right now.
 
  • Informative
Reactions: Lemmingwiser
ROCM on Linux if your GPU supports it and you are okay with potential headaches. Otherwise you can use the DirectML fork of WebUI or SD.Next with --use-directml on Windows. ComfyUI on Windows also can use DirectML but the VRAM utilization is quite bad.
FWIW, I settled on ComfyUI using DirectML but then I have oodles of VRAM to play with.

@Spruce Supremacist I find this the most performant way of generating images with AMD on Windows, though you will find ComfyUI interface more complex than the standard Automatic1111 UI. If you go with Automatic1111 then you'll want the AMD supporting fork referenced here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs . This works okay, but slower than Nvidia by a significant amount. For non-SDXL models, it's usable. For SDXL models, you're looking at 7+ minutes to make a 1024x1025 image. So... learn to upscale.

On ComfyUI with DirectML, it will take me a couple of minutes to make an SDXL image of 1024x1024 and this is with a 7900XT! Really, we're all just waiting for ROCm on Windows to be brought up to parity with Linux. If you're serious about it, you'll probably want to dual boot Linux. There's also Stable Cascade. That's significantly faster on my machine, like twice as quick or more. But it's also experimental and has limited user content right now. I wouldn't start with that.

If you're really new to this and just wanting to play around, you might consider just creating an account with Runpod. You can be up and running pretty quickly and just $10 can buy you 20+ hours of high-performance GPU, billed only when using it. It's honestly more cost effective imo to just rent from them and get the most out of your time.
 
Jesus. On Linux this would be more like 40 or 20 seconds. It takes me two minutes with my RX 6700 XT.

That doesn't sound right. Are you sure you're actually using DirectML? That sounds like CPU inference speed. My 4GB RX 570 can spit out a SDXL 1024x1024 image faster than that on low vram in WebUI and SD.Next. https://github.com/comfyanonymous/ComfyUI#directml-amd-cards-on-windows
Give me a few minutes. I'll fire it up and do a test image.
 
Back