What if we reach a point where there's so many AI generated images on the 'Net that it'll feed itself back into the dataset and output worse results?

Colon capital V · Oct 18, 2023

Just something in the back of my head that I've wondered for a while now about how those in higher power would handle that kind of problem once AI pics have overtaken a large portion of content on the internet.

m1ddl3m4rch · Oct 18, 2023

it'll feed itself back into the dataset and output worse results?

That's basically Hollywood.

Breadbassket · Oct 18, 2023

Perhaps Dead Internet theory will become Dead Internet reality.

Mr.Miyagi · Oct 18, 2023

You'd have the researchers who are creating the model dedicate more and more time going through and manually pruning the dataset (or rather, getting grad students/cheap mechanical Turks to do it), assuming that they haven't already developed a way to determine whether the art is human-made prior to scraping automatically. Any new dataset would likely just be building upon the old dataset (barring some kind of shift in IP law following all of these lawsuits that necessitates removing most parts), as well. So it's less that you're photocopying a photocopy ad infinitum, and more that, if you wanted to create a larger dataset of images, you'd need to provide additional human-made art (which likely won't be going anyway any time soon, no matter who might claim otherwise.) It'll just take more labor.

Of course, it's also a question of how big a dataset really needs to be. If you're trying to shoot for some absurd 1-trillion hyperparameters, you'd need plenty of data. But at the same time, advances in LLMs are being made in efficiency beyond just the standard method of "make it bigger." If you can get a model that produces similar results in quality while needing only 1/10th of the hyperparameters, chances are you won't need to expand too far in terms of what open datasets you're using, supplemented slowly over time.

Miller · Oct 18, 2023

Don't worry, they're lobotomizing the crap out of those AIs right now.
The only thing that they will render is "2+2= Vote Blue"

Blackpilling aside, those who own OpenAI are doing everything they can to keep that power. They will never allow normal people to use their version of the AI. I have a feeling that it is why Microsoft is testing and lobotomizing it on Bing.

The Mass Shooter Ron Soye · Oct 18, 2023

They'll find workarounds. Maybe a tiered system where billions of images are in the "dump" tier, and smaller sets of better manually curated ones are given greater weight. Similar to hypernetworks/LoRAs. They can also find automatic ways to filter what is coming in, discarding some of the total crap.

It's possible that adding significantly more training images is already unnecessary. Are Midjourney, SDXL, DALL-E, etc. going to improve much by increasing the size of the training set by another order of magnitude? If you look around you can probably find some papers explaining what they are doing to improve their models.

Lord of the Large Pants · Oct 18, 2023

Most AI generated images have special metadata (beyond the usual) to tell models not to use them to train.

Although nothing says the models can't ignore it.

Linako 2.0 · Oct 18, 2023

Isn't that already happening?

What if we reach a point where there's so many AI generated images on the 'Net that it'll feed itself back into the dataset and output worse results?

Colon capital V

Loudest, biggest, most nuclear-size Brap above me

m1ddl3m4rch

'beware the ides of March'

it'll feed itself back into the dataset and output worse results?

Breadbassket

Mr.Miyagi

Miller

@Grok is this true?

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

Lord of the Large Pants

Chicks dig giant robots.

Linako 2.0

One of few based™ oldfags

What if we reach a point where there's so many AI generated images on the 'Net that it'll feed itself back into the dataset and output worse results?

Loudest, biggest, most nuclear-size Brap above me

'beware the ides of March'

it'll feed itself back into the dataset and output worse results?​

@Grok is this true?

You CAN'T NOT DO IT!

Chicks dig giant robots.

One of few based™ oldfags

it'll feed itself back into the dataset and output worse results?