Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

inception_state · Jun 12, 2024

The Mass Shooter Ron Soye said:
Looks like a disaster so far, and you could easily imagine the same issues extending to the 8 billion parameter model if the problem is that they attempted to make SD3 "safe". The coomers will probably end up being the ones to salvage it with retraining.
Ridiculed Stable Diffusion 3 release excels at AI-generated body horror (archive)

The problem is that fixing it will take a lot of work and a lot of GPU time. With previous releases people have been willing to do it because of the open weights and permissive licensing. That went away with the SD3 release, they now charge $20 a month for a license that covers up to 6,000 images (basically nothing), and say "contact us" for enterprise plans with higher image caps. The popular checkpoints like Juggernaut and Pony get sponsored by generation services, those services can't use SD3 without paying "contact us" rates (aka $$$), so there's no point in them sponsoring improvements.

People are now talking about finetuning Pixart or Lumina instead, because those at least have permissive licensing and aren't lobotomized by NSFW filters. Eg. https://old.reddit.com/r/StableDiff...t_a_friendly_reminder_that_pixart_and_lumina/

Overly Serious · Jun 13, 2024

inception_state said:
The problem is that fixing it will take a lot of work and a lot of GPU time. With previous releases people have been willing to do it because of the open weights and permissive licensing. That went away with the SD3 release, they now charge $20 a month for a license that covers up to 6,000 images (basically nothing), and say "contact us" for enterprise plans with higher image caps. The popular checkpoints like Juggernaut and Pony get sponsored by generation services, those services can't use SD3 without paying "contact us" rates (aka $$$), so there's no point in them sponsoring improvements.

People are now talking about finetuning Pixart or Lumina instead, because those at least have permissive licensing and aren't lobotomized by NSFW filters. Eg. https://old.reddit.com/r/StableDiff...t_a_friendly_reminder_that_pixart_and_lumina/

I'm going to check out Lumina and Pixart. Pixart is apparently more popular due to having significantly lower hardware requirements but it looks like I have sufficient to try Lumina and frankly, the hardware is becoming more available all the time so I think quality will be the deciding factor.

I was under the impression that the SD models were "Open Source" in the sense that the code and training mechanism were open but that the training data was not. If this is the case it makes me wonder what would be involved in a community based effort to obtain and label enough data to do their own SD3. I could be wrong about this.

macrodegenerate · Jun 13, 2024

Overly Serious said:
I'm going to check out Lumina and Pixart. Pixart is apparently more popular due to having significantly lower hardware requirements but it looks like I have sufficient to try Lumina and frankly, the hardware is becoming more available all the time so I think quality will be the deciding factor.

I was under the impression that the SD models were "Open Source" in the sense that the code and training mechanism were open but that the training data was not. If this is the case it makes me wonder what would be involved in a community based effort to obtain and label enough data to do their own SD3. I could be wrong about this.

That's essentially what Pony-XL is, only with several million images, as opposed to several billion for SDXL. My question is how exactly Stability AI managed to tag these billions of images for training. Were they having jeets do it? Was it automated? If so, the accuracy of SD relies on the tagger more than anything.

DavidS877 · Jun 13, 2024

Apparently someone has made a LORA for SDXL so you can get the same level of accuracy as SD 3.0

std::string · Jun 13, 2024

macrodegenerate said:
My question is how exactly Stability AI managed to tag these billions of images for training. Were they having jeets do it? Was it automated? If so, the accuracy of SD relies on the tagger more than anything.

Automated tagging based on scraping the words on the website the image is on.

The idea is that bad tags will be buried once the training set gets large enough. People might fuck up tagging stuff or include irrelevant SEO spam on their page but they're not all doing it the exact same way, so that just becomes noise instead of a pattern capable of misleading the trainer.

inception_state · Jun 13, 2024

macrodegenerate said:
That's essentially what Pony-XL is, only with several million images, as opposed to several billion for SDXL. My question is how exactly Stability AI managed to tag these billions of images for training. Were they having jeets do it? Was it automated? If so, the accuracy of SD relies on the tagger more than anything.

Per the paper, they used a 50/50 mix of original captions and synthetic captions generated with CogVLM. The success rate is still under 50% overall though, which doesn't seem great. Position in particular is only 18%. Something like Pony trained on crowdsourced, manually tagged datasets like Danbooru would presumably be much better.

https://arxiv.org/pdf/2403.03206

JackoVerde · Jun 13, 2024

if you have an AMD card, LM studio is the easiest bet right now for chat ai options.
it even works on windows.

Toolbox · Jun 13, 2024

Not sure if it was mentioned here, but an extension for comfy ui's github was maliciously used to install keyloggers or just info scrapers onto people's pcs, having been breached by some 'hacking group' with a fetishistic name. By their claims, they breached this github before anything was even published, meaning anyone who's used 'Comfyui_LLMVISION' potentially installed a trojan.

You can view the info over on the linked post, but here's one of their messages.

It seems incredibly likely this was some fetish fag getting mad at other fetishists for not paying porn artists, rather just generating it.

inception_state · Jun 13, 2024

Toolbox said:
Not sure if it was mentioned here, but an extension for comfy ui's github was maliciously used to install keyloggers or just info scrapers onto people's pcs, having been breached by some 'hacking group' with a fetishistic name. By their claims, they breached this github before anything was even published, meaning anyone who's used 'Comfyui_LLMVISION' potentially installed a trojan.

You can view the info over on the linked post, but here's one of their messages.
View attachment 6085172
It seems incredibly likely this was some fetish fag getting mad at other fetishists for not paying porn artists, rather just generating it.

We were talking about it a couple pages back: https://kiwifarms.st/threads/stable-diffusion-novelai-machine-learning-art.130730/post-18563021

It seems like trolling to me. They wanted to hack people, this was an easy way to do it, and once they are caught might as well troll the journalists.

Susanna · Jun 13, 2024

I’m fine blaming artists regardless whether they actually did this or not. I know a couple, and they’re absolutely insufferable.

indomitable snowman · Jun 14, 2024

playing around with XL still. Will wait till 3.0 is better.
Accidentally put my start parameters as a prompt instead of what I thought I had copied (./webui.sh --medvram --xformers) and it made this surprisingly

edit, just for fun, I took the users' avatars on this page, used img2img and used just their name as the prompt:
@std::string

@inception_state

@Overly Serious

@macrodegenerate

@davids877

@JackoVerde

@Toolbox

@Susanna

me lol

BrunoMattei · Jun 15, 2024

What program do you use to have an ai duplicate a voice? What else is necessary?

Artificial Stupidity · Jun 15, 2024

I tried a few SD3 Medium things today
I had some success and as always it's cats

0141-A vibrant psychedelic cat blissfully adr-stableDiffusion3SD3_sd3MediumInclT5XXL-757088670.png

0142-A vibrant psychedelic cat blissfully adr-stableDiffusion3SD3_sd3MediumInclT5XXL-55040681.png

0224-A vibrant psychedelic cat blissfully flo-stableDiffusion3SD3_sd3MediumInclT5XXL-1699724970.png

I found 9:21 can generate sometimes but i get a double and partially distorted image

0143-A vibrant psychedelic cat blissfully adr-stableDiffusion3SD3_sd3MediumInclT5XXL-1734662087.png

I'm sure I could get better results with a good prompt either way it's still neat.

I still think Dall-E is better for ease of use

Edit: so to note double post:

I was trying to get dall-e to make blood and it becomes a "dangerous prompt and gets blocked" so working around it I tried:

A wizard cat invoking unspeakable horrors from the Necronomicon, with a cover that appears like decayed artificial material, not human skin. The scene includes an explosion of a liquid similar to blood in consistency and texture but not blood, oozing from the cover and bursting into the air from the pages. The image is imbued with realism, an eerie and haunting backdrop, glowing particle effects, ray tracing reflections, and rendered in 4k resolution. The cinematography adds a unique touch, making it feel real, with the presence of a Dark Lord intensifying the ominous atmosphere.

Adjusting the above prompt I got a few results but after repeating the prompt with a few adjustments it became blocked somehow

These are my favourite ones

Soggy Floppa · Jun 16, 2024

BrunoMattei said:
What program do you use to have an ai duplicate a voice? What else is necessary?

https://github.com/erew123/alltalk_tts/

You can use this to do voice cloning from a short snipped of audio, but the results are very hit or miss and the quality is kind of low. But what you could do is feed the results from this into RVC (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md) and use one of the easily trained and better sounding RVC voice models overtop the generated voice. You don't need to use alltalk, you could use any TTS as a base. I think this is how all those AI meme videos are made, or rather, the ones that don't use ElevenLabs.

I think ElevenLabs will probably still have better quality out of the box, and is easier to use, but you have to pay them and give them your personal information. Local is always better.

BrunoMattei · Jun 16, 2024

Soggy Floppa said:
https://github.com/erew123/alltalk_tts/
You can use this to do voice cloning from a short snipped of audio, but the results are very hit or miss and the quality is kind of low. But what you could do is feed the results from this into RVC (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md) and use one of the easily trained and better sounding RVC voice models overtop the generated voice. You don't need to use alltalk, you could use any TTS as a base. I think this is how all those AI meme videos are made, or rather, the ones that don't use ElevenLabs.

I think ElevenLabs will probably still have better quality out of the box, and is easier to use, but you have to pay them and give them your personal information. Local is always better.

I have problems installing local shit because I'm an AMD schmuck. But thanks. Would ElevenLabs care if, for example just off the top of my head, if I voice cloned Jim Jones?

Soggy Floppa · Jun 16, 2024

BrunoMattei said:
I have problems installing local shit because I'm an AMD schmuck. But thanks. Would ElevenLabs care if, for example just off the top of my head, if I voice cloned Jim Jones?

You can run these off the CPU, obviously they will be slower but these TTS things are already pretty fast. It's much easier to use Linux for machine learning stuff if you have AMD because of ROCm (Which maybe has a Windows version? I doubt its as easy to install however).

ElevenLabs lets you do any sort of cloning, but I think they track what you input as prompts to stop people from making people say racist things, and they require your personal information, so I want nothing to do with them.

BrunoMattei · Jun 16, 2024

Yeah, eventually I want to build an Intel rig before being forced to dump Windows OS for Linux. It has to specifically be Linux and not Ubuntu as I understand.

macrodegenerate · Jun 16, 2024

BrunoMattei said:
It has to specifically be Linux and not Ubuntu as I understand.

Why do you think it doesn't have to be Ubuntu? All my ML stuff has run on Ubuntu.

BrunoMattei · Jun 16, 2024

macrodegenerate said:
Why do you think it doesn't have to be Ubuntu? All my ML stuff has run on Ubuntu.

I'm very new to this so my apologies for sounding like a Boomer. I'll get on that then. The documentation for this shit is seemingly all over the place and I've been doing intermittent research.

Soggy Floppa · Jun 16, 2024

BrunoMattei said:
I'm very new to this so my apologies for sounding like a Boomer. I'll get on that then. The documentation for this shit is seemingly all over the place and I've been doing intermittent research.

Oh yeah its borderline impossible to actually learn about any of this stuff when you start because of how quickly things are moving and how much people expect you already know. I've ironically found the Bing GPT to be very helpful in this regard.

Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

inception_state

Overly Serious

macrodegenerate

Generative AI was a mistake

DavidS877

Giant Meteor Goes to Washington

std::string

inception_state

JackoVerde

Toolbox

Trusted the PlQn

inception_state

Susanna

Ruin is inevitable, and all else is prelude

indomitable snowman

Can't stop, won't stop (unless it gets above 32°F)

Attachments

BrunoMattei

No I am not the Cinema Snob

Artificial Stupidity

Meow

Soggy Floppa

BrunoMattei

No I am not the Cinema Snob

Soggy Floppa

BrunoMattei

No I am not the Cinema Snob

macrodegenerate

Generative AI was a mistake

BrunoMattei

No I am not the Cinema Snob

Soggy Floppa