You'd have the researchers who are creating the model dedicate more and more time going through and manually pruning the dataset (or rather, getting grad students/cheap mechanical Turks to do it), assuming that they haven't already developed a way to determine whether the art is human-made prior to scraping automatically. Any new dataset would likely just be building upon the old dataset (barring some kind of shift in IP law following all of these lawsuits that necessitates removing most parts), as well. So it's less that you're photocopying a photocopy ad infinitum, and more that, if you wanted to create a larger dataset of images, you'd need to provide additional human-made art (which likely won't be going anyway any time soon, no matter who might claim otherwise.) It'll just take more labor.
Of course, it's also a question of how big a dataset really needs to be. If you're trying to shoot for some absurd 1-trillion hyperparameters, you'd need plenty of data. But at the same time, advances in LLMs are being made in efficiency beyond just the standard method of "make it bigger." If you can get a model that produces similar results in quality while needing only 1/10th of the hyperparameters, chances are you won't need to expand too far in terms of what open datasets you're using, supplemented slowly over time.