Lockheed Martin Engineer
kiwifarms.net
- Joined
- May 13, 2023
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
So under this logic AI art is real art at the point someone convinces a zeitgeist-"legitimate" (e.g. not-Elon-Musk) billionaire to buy it for an exorbitant fee. Intredasting, very intredasting, Guardian.TLDR: presenting other people's derivative slop as his own, to scam low IQ kikes or facilitate money laundering is his medium, not the canvas or clay.
OpenAI: 'Impossible to train today’s leading AI models without using copyrighted materials'
As IEEE study shows super lab's neural nets can emit 'plagiaristic output'
OpenAI has said it would be "impossible" to build top-tier neural networks that meet today's needs without using people's copyrighted work. The Microsoft-backed lab, which believes it is lawfully harvesting said content for training its models, said using out-of-copyright public domain material would result in sub-par AI software.
This assertion comes at a time when the machine-learning world is sprinting head first at the brick wall that is copyright law. Just this week an IEEE report concluded Midjourney and OpenAI's DALL-E 3, two of the major AI services to turn text prompts into images, can recreate copyrighted scenes from films and video games based on their training data.
The study, co-authored by Gary Marcus, an AI expert and critic, and Reid Southen, a digital illustrator, documents multiple instances of "plagiaristic outputs" in which OpenAI and DALL-E 3 render substantially similar versions of scenes from films, pictures of famous actors, and video game content.
Marcus and Southen say it's almost certain that Midjourney and OpenAI trained their respective AI image-generation models on copyrighted material.
Whether that's legal, and whether AI vendors or their customers risk being held liable, remain contentious question. However, the report's findings may bolster those suing Midjourney and DALL-E maker OpenAI for copyright infringement.
"Both OpenAI and Midjourney are fully capable of producing materials that appear to infringe on copyright and trademarks," they wrote. "These systems do not inform users when they do so. They do not provide any information about the provenance of the images they produce. Users may not know, when they produce an image, whether they are infringing."
Neither biz has fully disclosed the training data used to make their AI models.
It's not just digital artists challenging AI companies. The New York Times recently sued OpenAI because its ChatGPT text model will spit out near-verbatim copies of the newspaper's paywalled articles. Book authors have filed similar claims, as have software developers.
Prior research has indicated that OpenAI's ChatGPT can be coaxed to reproduce training text. And those suing Microsoft and GitHub contend the Copilot coding assistant model will reproduce code more or less verbatim.
Southen observed that Midjourney is charging customers who are creating infringing content and profiting via subscription revenue. "MJ [Midjourney] users don't have to sell the images for copyright infringement to have potentially occurred, MJ already profits from its creation," he opined, echoing an argument made in the IEEE report.
OpenAI also charges a subscription fee and thus profits in the same way. Neither OpenAI and Midjourney did not respond to requests for comment.
However, OpenAI on Monday published a blog post addressing the New York Times lawsuit, which the AI seller said lacked merit. Astonishingly, the lab said that if its neural networks generated infringing content, it was a "bug."
In total, the upstart today argued that: It actively collaborates with news organizations; training on copyrighted data qualifies for the fair use defense under copyright law; "'regurgitation' is a rare bug that we are working to drive to zero"; and the New York Times has cherry-picked examples of text reproduction that don't represent typical behavior.
The law will decide
Tyler Ochoa, a professor in the law department at Santa Clara University in California, told The Register that while the IEEE report's findings are likely to help litigants with copyright claims, they shouldn't – because the authors of the article have, in his view, misrepresented what's happening.
"They write: 'Can image-generating models be induced to produce plagiaristic outputs based on copyright materials? ... [W]e found that the answer is clearly yes, even without directly soliciting plagiaristic outputs.'"
Ochoa questioned that conclusion, arguing the prompts the report's authors "entered demonstrate that they are, indeed, directly soliciting plagiaristic outputs. Every single prompt mentions the title of a specific movie, specifies the aspect ratio, and in all but one case, the words 'movie' and 'screenshot' or 'screencap.' (The one exception describes the image that they wanted to replicate.)"
The law prof said the issue for copyright law is determining who is responsible for these plagiaristic outputs: The creators of the AI model or the people who asked the AI model to reproduce a popular scene.
"The generative AI model is capable of producing original output, and it is also capable of reproducing scenes that resemble scenes from copyrighted inputs when prompted," explained Ochoa. "This should be analyzed as a case of contributory infringement: The person who prompted the model is the primary infringer, and the creators of the model are liable only if they were made aware of the primary infringement and they did not take reasonable steps to stop it."
Ochoa said generative AI models are more likely to reproduce specific images when there are multiple instances of those images in their training data set.
"In this case, it is highly unlikely that the training data included entire movies; it is far more likely that the training data included still images from the movies that were distributed as publicity stills for the movie," he said. "Those images were reproduced multiple times in the training data because media outlets were encouraged to distribute those images for publicity purposes and did so.
"It would be fundamentally unfair for a copyright owner to encourage wide dissemination of still images for publicity purposes, and then complain that those images are being imitated by an AI because the training data included multiple copies of those same images."
Ochoa said there are steps to limit such behavior from AI models. "The question is whether they should have to do so, when the person who entered the prompt clearly wanted to get the AI to reproduce a recognizable image, and the movie studios that produced the original still images clearly wanted those still images to be widely distributed," he said.
"A better question would be: How often does this happen when the prompt does not mention a specific movie or describe a specific character or scene? I think an unbiased researcher would likely find that the answer is rarely (perhaps almost never)."
Nonetheless, copyrighted content appears to be essential fuel for the making of these models function well.
OpenAI defends itself to Lords
In response to an inquiry into the risks and opportunities of AI models by the UK's House of Lords Communications and Digital Committee, OpenAI presented a submission [PDF] warning that its models won't work without being trained on copyrighted content.
"Because copyright today covers virtually every sort of human expression – including blog posts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today’s leading AI models without using copyrighted materials," the super lab said.
"Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens."
The AI biz said it believes that it complies with copyright law and that training on copyrighted material is lawful, though it allows that "that there is still work to be done to support and empower creators."
That sentiment, which sounds like a diplomatic recognition of ethical concerns about compensation for the arguable fair use of copyrighted work, should be considered in conjunction with the IEEE report's claim that, "we have discovered evidence that a senior software engineer at Midjourney took part in a conversation in February 2022 about how to evade copyright law by 'laundering' data 'through a fine tuned codex.'"
Marcus, co-author of the IEEE report, expressed skepticism of OpenAI's effort to obtain a regulatory green light in the UK for its current business practices.
"Rough Translation: We won’t get fabulously rich if you don’t let us steal, so please don’t make stealing a crime!" he wrote in a social media post. "Don’t make us pay licensing fees, either! Sure Netflix might pay billions a year in licensing fees, but we shouldn’t have to! More money for us, moar!"
OpenAI has offered to indemnify enterprise ChatGPT and API customers against copyright claims, though not if the customer or the customer's end users "knew or should have known the Output was infringing or likely to infringe" or if the customer bypassed safety features, among other limitations. Thus, asking DALL-E 3 to recreate a famous film scene – which users ought to know is probably covered by copyright – would not qualify for indemnification.
Midjourney has taken the opposite approach, promising to hunt down and sue customers involved in infringement to recover legal costs arising from related claims.
"If you knowingly infringe someone else’s intellectual property, and that costs us money, we’re going to come find you and collect that money from You," Midjourney's Terms of Service state. "We might also do other stuff, like try to get a court to make you pay our legal fees. Don’t do it." ®
Well, that sucks I guess. What sucks even more for the copyright holders is that using their material to train a model is as much of a theft as a human doing the same is.This article got a lot of reach.
https://www.theregister.com/2024/01/08/midjourney_openai_copyright/
“As a closer look at the technology of generative AI models reveals, the training of such models is not a case of text and data mining. It is a case of copyright infringement – no exception applies under German and European copyright law,” says Prof. Dornis. Prof. Stober explains that “parts of the training data can be memorized in whole or in part by current generative models - LLMs and (latent) diffusion models - and can therefore be generated again with suitable prompts by end users and thus reproduced.”
“the study not only proves that the training of Generative AI models is not covered by text and data mining, but that it also provides further important indications and suggestions for a better balance between the protection of human creativity and the promotion of AI innovation.”
commented Hanna Möllers, legal advisor to the DJV and representative of the European Federation of Journalists (EFJ).“This study is explosive because it proves that we are dealing with large-scale theft of intellectual property. The ball is now in the politicians' court to draw the necessary conclusions and finally put an end to this theft at the expense of journalists and other authors,”
The composer and spokesperson for the Copyright Initiative, Matthias Hornschuh, comments:“It is a groundbreaking result if we now have proof that the reproduction of works by an AI model constitutes a copyright-relevant reproduction and, in addition, that making them available on the European Union market may infringe the right of making available to the public.”
“There would be a new, profitable licensing market on the horizon, but no remuneration is flowing, while generative AI is preparing to replace those whose content it lives from in its own market. This jeopardizes professional knowledge work and cannot be in the interests of society, culture or the economy. All the better that the authors of our tandem study provide the technological and copyright basis for finally turning the legal consideration of generative artificial intelligence from its head to its feet.”
There was a study that showed that AI s infringement.
oh, phew, for a second i thought they might be referencing laws or countries that were actually importantIt is a case of copyright infringement – no exception applies under German and European copyright law
Ah sweet, another skitzo thread
It's just obvious bullshit if you know anything about AI. Please show me an actual example of an AI actually reproducing any of the training material. That just doesn't fucking happen. Even in the loras I've personally trained no result looks like any of the sample images despite loras specifically being to make things that closely resemble the training images. People keep spouting this retarded idea but none of them actually have any proof. Most of the time they tell the AI to make something that looks like the bloodborne poster and then will act surprised that it can make something that roughly looks like the bloodborne poster. Then you look closer and it looks nothing alike. Most people point to this study which anyone with eyes can tell is obvious shit. This study that claims to show how AI can 'reproduce' training images, yet even in the study itself they couldn't get an actual replica. The 'copies' are very clearly not the same, they look similar, but you can't be surprised that when you ask for an image of a shoe it looks like a shoe.Prof. Dornis. Prof. Stober explains that “parts of the training data can be memorized in whole or in part by current generative models - LLMs and (latent) diffusion models - and can therefore be generated again with suitable prompts by end users and thus reproduced.”
Please show me an actual example of an AI actually reproducing any of the training material.
In case it's not clear what's happening here: @github
's Copilot "autocompletes" the fast inverse square root implementation from Quake III — which is GPL2+ code. It then autocompletes a BSD2 license comment (with the wrong copyright holder). This is fine.
Even those images are not the same though, it did not spit out a training image. You could reasonably create especially the joker one with make up and a bit of time. They look similar but if you ask something to create an image of the joker poster you can't be surprised when it makes something that resembles it. If you tell the AI to take a popular character and make her sit in a forest and play a guitar then obviously it will make something similar, it didn't make something identical though, the backgrounds are completely different. But it didn't recreate the original training image. It got close yea, but it didn't recreate it. I'm not saying that AI can't create things that look like certain things if you try hard enough, but they will never spit out an exact copy of an image they were trained on. The AI isn't spitting out training images, it's creating an image that is similar to your prompt, you just used a prompt that described a poster or cutscene from a game.
you know that actually missed me but yeah people draw in each others styles all the time. so that part of it isnt even arguably theft. like you cant be sued for drawing in chibi or animating rubberhose style. holy fuck their complaints are so petty. anything to make it a moral issue instead of just showing some humility, admit "learn to code" was a shitty response to bill being fired, and admit they needed to show more basic fucking humanity to peopleImagine how ridiculous and petty it sounds when a human artist starts complaining about someone else stealing their style.
Not really, because these people are histrionic personalities. There would be quite a few good arguments they could make to turn this more into an discourse about possible limitations on AI but they absolutely refuse to know their enemy, keep making the same nonsensical arguments and just can't stop shitting and pissing themselves while stomping with their feet on the ground and asking for all progress to stop (lol, as if that's ever gonna happen) and everything to be banned and all harddrives that ever contained an image model to be burned OR ELSE, who can take that seriously? The side of the artists in this particular thing is just so completely impossible to sympathize with for the average person because of their behavior. That these people always acted and continue to act like assholes to everyone who's not them doesn't really help their case either, as people not involved are hard pressed to even care.Has the seething accomplished anything?
So he is seething because some random Twitter artist is using AI for a more efficient workflow? That is crazy. Nitter thread (Archive)
I never thought I'd see the day when Art needs to have rules. Cultural marxism is actually the worst thing to happen to youths.So he is seething because some random Twitter artist is using AI for a more efficient workflow? That is crazy. Nitter thread (Archive)
Understandable(warning potential sperging)
It's actually something that's been happening for a while:OH MY FUCK!!! why artists on the internet acting more pathetic each YEAR its just to show many people here do not deserve to be in the industry and they start making strawman video like THIS OR SOMETHING IN LINE!! its EXHAUSTING!!!
Property is theft and copyright is rent-seeking behavior which basically makes these artists fascist landlords over art.A lot of artists buy into the "exclusivity" of art and the corresponding idea that "They know the true value" not the people they need to pay for it..
> Using AI for yourself with no one ever knowing about it still somehow encourages the use of itmore tumblrina ''artist'' whining about how ai art is EVUL
View attachment 6367387View attachment 6367390
it funny they use a/i like it voldemort name. Also nevermind they aren't against ''borrowing without permission'' they're just against ai... bunch of hypocrite