Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Looks like a disaster so far, and you could easily imagine the same issues extending to the 8 billion parameter model if the problem is that they attempted to make SD3 "safe". The coomers will probably end up being the ones to salvage it with retraining.
Ridiculed Stable Diffusion 3 release excels at AI-generated body horror (archive)
The problem is that fixing it will take a lot of work and a lot of GPU time. With previous releases people have been willing to do it because of the open weights and permissive licensing. That went away with the SD3 release, they now charge $20 a month for a license that covers up to 6,000 images (basically nothing), and say "contact us" for enterprise plans with higher image caps. The popular checkpoints like Juggernaut and Pony get sponsored by generation services, those services can't use SD3 without paying "contact us" rates (aka $$$), so there's no point in them sponsoring improvements.

People are now talking about finetuning Pixart or Lumina instead, because those at least have permissive licensing and aren't lobotomized by NSFW filters. Eg. https://old.reddit.com/r/StableDiff...t_a_friendly_reminder_that_pixart_and_lumina/
 
The problem is that fixing it will take a lot of work and a lot of GPU time. With previous releases people have been willing to do it because of the open weights and permissive licensing. That went away with the SD3 release, they now charge $20 a month for a license that covers up to 6,000 images (basically nothing), and say "contact us" for enterprise plans with higher image caps. The popular checkpoints like Juggernaut and Pony get sponsored by generation services, those services can't use SD3 without paying "contact us" rates (aka $$$), so there's no point in them sponsoring improvements.

People are now talking about finetuning Pixart or Lumina instead, because those at least have permissive licensing and aren't lobotomized by NSFW filters. Eg. https://old.reddit.com/r/StableDiff...t_a_friendly_reminder_that_pixart_and_lumina/
I'm going to check out Lumina and Pixart. Pixart is apparently more popular due to having significantly lower hardware requirements but it looks like I have sufficient to try Lumina and frankly, the hardware is becoming more available all the time so I think quality will be the deciding factor.

I was under the impression that the SD models were "Open Source" in the sense that the code and training mechanism were open but that the training data was not. If this is the case it makes me wonder what would be involved in a community based effort to obtain and label enough data to do their own SD3. I could be wrong about this.
 
I'm going to check out Lumina and Pixart. Pixart is apparently more popular due to having significantly lower hardware requirements but it looks like I have sufficient to try Lumina and frankly, the hardware is becoming more available all the time so I think quality will be the deciding factor.

I was under the impression that the SD models were "Open Source" in the sense that the code and training mechanism were open but that the training data was not. If this is the case it makes me wonder what would be involved in a community based effort to obtain and label enough data to do their own SD3. I could be wrong about this.
That's essentially what Pony-XL is, only with several million images, as opposed to several billion for SDXL. My question is how exactly Stability AI managed to tag these billions of images for training. Were they having jeets do it? Was it automated? If so, the accuracy of SD relies on the tagger more than anything.
 
Last edited:
  • Thunk-Provoking
Reactions: BrunoMattei
Apparently someone has made a LORA for SDXL so you can get the same level of accuracy as SD 3.0

2024-06-13_07-21.png
 
My question is how exactly Stability AI managed to tag these billions of images for training. Were they having jeets do it? Was it automated? If so, the accuracy of SD relies on the tagger more than anything.
Automated tagging based on scraping the words on the website the image is on.

The idea is that bad tags will be buried once the training set gets large enough. People might fuck up tagging stuff or include irrelevant SEO spam on their page but they're not all doing it the exact same way, so that just becomes noise instead of a pattern capable of misleading the trainer.
 
  • Thunk-Provoking
Reactions: BrunoMattei
That's essentially what Pony-XL is, only with several million images, as opposed to several billion for SDXL. My question is how exactly Stability AI managed to tag these billions of images for training. Were they having jeets do it? Was it automated? If so, the accuracy of SD relies on the tagger more than anything.
Per the paper, they used a 50/50 mix of original captions and synthetic captions generated with CogVLM. The success rate is still under 50% overall though, which doesn't seem great. Position in particular is only 18%. Something like Pony trained on crowdsourced, manually tagged datasets like Danbooru would presumably be much better.

 
  • Like
Reactions: Potatoherder
Not sure if it was mentioned here, but an extension for comfy ui's github was maliciously used to install keyloggers or just info scrapers onto people's pcs, having been breached by some 'hacking group' with a fetishistic name. By their claims, they breached this github before anything was even published, meaning anyone who's used 'Comfyui_LLMVISION' potentially installed a trojan.

You can view the info over on the linked post, but here's one of their messages.
1718320325968.png
It seems incredibly likely this was some fetish fag getting mad at other fetishists for not paying porn artists, rather just generating it.
 
Not sure if it was mentioned here, but an extension for comfy ui's github was maliciously used to install keyloggers or just info scrapers onto people's pcs, having been breached by some 'hacking group' with a fetishistic name. By their claims, they breached this github before anything was even published, meaning anyone who's used 'Comfyui_LLMVISION' potentially installed a trojan.

You can view the info over on the linked post, but here's one of their messages.
View attachment 6085172
It seems incredibly likely this was some fetish fag getting mad at other fetishists for not paying porn artists, rather just generating it.
We were talking about it a couple pages back: https://kiwifarms.st/threads/stable-diffusion-novelai-machine-learning-art.130730/post-18563021

It seems like trolling to me. They wanted to hack people, this was an easy way to do it, and once they are caught might as well troll the journalists.
 
  • Thunk-Provoking
Reactions: Toolbox

Attachments

  • 00024-830622889.png
    00024-830622889.png
    1.6 MB · Views: 11
  • 00007-3180873296.png
    00007-3180873296.png
    1.2 MB · Views: 18
Last edited:
What program do you use to have an ai duplicate a voice? What else is necessary?
 
I tried a few SD3 Medium things today
I had some success and as always it's cats
0141-A vibrant psychedelic cat blissfully adr-stableDiffusion3SD3_sd3MediumInclT5XXL-757088670.png0142-A vibrant psychedelic cat blissfully adr-stableDiffusion3SD3_sd3MediumInclT5XXL-55040681.png0224-A vibrant psychedelic cat blissfully flo-stableDiffusion3SD3_sd3MediumInclT5XXL-1699724970.png

I found 9:21 can generate sometimes but i get a double and partially distorted image
0143-A vibrant psychedelic cat blissfully adr-stableDiffusion3SD3_sd3MediumInclT5XXL-1734662087.png
I'm sure I could get better results with a good prompt either way it's still neat.

I still think Dall-E is better for ease of use
biker.jpg

Edit: so to note double post:

I was trying to get dall-e to make blood and it becomes a "dangerous prompt and gets blocked" so working around it I tried:

A wizard cat invoking unspeakable horrors from the Necronomicon, with a cover that appears like decayed artificial material, not human skin. The scene includes an explosion of a liquid similar to blood in consistency and texture but not blood, oozing from the cover and bursting into the air from the pages. The image is imbued with realism, an eerie and haunting backdrop, glowing particle effects, ray tracing reflections, and rendered in 4k resolution. The cinematography adds a unique touch, making it feel real, with the presence of a Dark Lord intensifying the ominous atmosphere.

Adjusting the above prompt I got a few results but after repeating the prompt with a few adjustments it became blocked somehow 🤔
_c7f4cc84-4ca3-402f-801a-84603bbfc2ab.jpeg_63e36322-bc0a-499d-a41d-caac41b32dcd.jpeg_3cb322f3-052a-4174-a70f-6b3302078eae.jpeg

These are my favourite ones

_a770057a-8b90-4452-adf1-fdea27ef96ed.jpeg
_f9e84546-f3d9-4cbe-8d76-85ca6718e8a8.jpeg
 
Last edited:
What program do you use to have an ai duplicate a voice? What else is necessary?
You can use this to do voice cloning from a short snipped of audio, but the results are very hit or miss and the quality is kind of low. But what you could do is feed the results from this into RVC (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md) and use one of the easily trained and better sounding RVC voice models overtop the generated voice. You don't need to use alltalk, you could use any TTS as a base. I think this is how all those AI meme videos are made, or rather, the ones that don't use ElevenLabs.

I think ElevenLabs will probably still have better quality out of the box, and is easier to use, but you have to pay them and give them your personal information. Local is always better.
 
You can use this to do voice cloning from a short snipped of audio, but the results are very hit or miss and the quality is kind of low. But what you could do is feed the results from this into RVC (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md) and use one of the easily trained and better sounding RVC voice models overtop the generated voice. You don't need to use alltalk, you could use any TTS as a base. I think this is how all those AI meme videos are made, or rather, the ones that don't use ElevenLabs.

I think ElevenLabs will probably still have better quality out of the box, and is easier to use, but you have to pay them and give them your personal information. Local is always better.
I have problems installing local shit because I'm an AMD schmuck. But thanks. Would ElevenLabs care if, for example just off the top of my head, if I voice cloned Jim Jones?
 
I have problems installing local shit because I'm an AMD schmuck. But thanks. Would ElevenLabs care if, for example just off the top of my head, if I voice cloned Jim Jones?
You can run these off the CPU, obviously they will be slower but these TTS things are already pretty fast. It's much easier to use Linux for machine learning stuff if you have AMD because of ROCm (Which maybe has a Windows version? I doubt its as easy to install however).

ElevenLabs lets you do any sort of cloning, but I think they track what you input as prompts to stop people from making people say racist things, and they require your personal information, so I want nothing to do with them.
 
Yeah, eventually I want to build an Intel rig before being forced to dump Windows OS for Linux. It has to specifically be Linux and not Ubuntu as I understand.
 
Why do you think it doesn't have to be Ubuntu? All my ML stuff has run on Ubuntu.
I'm very new to this so my apologies for sounding like a Boomer. I'll get on that then. The documentation for this shit is seemingly all over the place and I've been doing intermittent research.
 
I'm very new to this so my apologies for sounding like a Boomer. I'll get on that then. The documentation for this shit is seemingly all over the place and I've been doing intermittent research.
Oh yeah its borderline impossible to actually learn about any of this stuff when you start because of how quickly things are moving and how much people expect you already know. I've ironically found the Bing GPT to be very helpful in this regard.
 
Back