New Meta Emails Reveal That the Company Downloaded 81.7 TB of Copyrighted Books via BitTorrent to Train Its AI Models - Tech giants have been continuously extracting data from the Internet to develop their models.

wtfNeedSignUp · Feb 8, 2025

I really fucking hope modern AI isn't being trained with smut for women.

Jimjamflimflam · Feb 8, 2025

Just committing a a little corporate crime, better document it all in subpoena-able email.

Breadbassket · Feb 21, 2025

Here is an update to this story.

Meta defends using pirated material, claims it's legal if you don't seed content
Meta claimed in a court filing this week that despite torrenting an 82 TB dataset of pirated, copyrighted material from shadow libraries to train its LLaMA AI models, that employees "took precautions not to "seed" any downloaded files".

The act of Seeding in torrenting terminology refers to sharing a file with other users during, (or commonly after) downloading it. Since torrenting is a peer-to-peer system, every user downloading a file can also upload parts of it to other users.

Meta's lawyers claim that there are "no facts to show that Meta seeded Plaintiffs' books". This means that the company's defense is pinning hopes on the fact that there isn't currently any proof that Meta shared the material during the torrenting process.

Though Meta claims that there is no evidence of seeding, Michael Clark, an executive at Meta in charge of project management testified that the configuration settings they were using were modified "so that the smallest amount of seeding possible could occur".

Following this statement, a question regarding why Meta chose to minimize seeding was asked, attorney-client privilege was invoked so that Clark could not answer.

Interestingly, the statement issued by Clark shows that Meta sought methods to minimize seeding, but has yet to offer up indication that it entirely prevented seeding copyrighted material.

Additionally, an internal message from Frank Zhang, a Meta researcher, could point toward alleged concealment of potential seeding from Meta's servers, to avoid "risk of tracing back the seeder/downloader" to Facebook servers.

Meta's defense seems to hinge around the lack of evidence around not sharing the large amount of data they have allegedly downloaded to train its AI models. Should Meta win on this defense and prove that downloading copyrighted content isn't illegal, but distribution is, it could shake up future cases of piracy and unauthorized distribution of copyrighted content.

The defense relying on torrenting terminology could also a way for Meta to aim in tripping up courts. Focusing on seeding could further muddy the claim that Meta allegedly knew that it was violating laws by torrenting copyrighted material.

Meta has yet to respond to claims surrounding on whether it knew that it was sharing data during the download process.

Authors allege Meta was "knowing participant" in "illegal peer-to-peer piracy network"
Authors of the copyrighted material alleged to have been obtained by Meta without prior licensing agreements have alleged [PDF] that "Meta's decision to bypass lawful acquisition methods and become a knowing participant in an illegal peer-to-peer piracy network".

With the court battle expected to continue, no final decision around the case has been made. Even following a final decision, it's expected that Meta will attempt to appeal the decision if they were to lose, meaning that final judgements could be a long while away.

But, similar cases do exist. OpenAI was sued by novelists in 2023, with the New York Times also suing OpenAI and Microsoft over "millions" copied news articles. As the long list of LLM-related litigation continues, this is likely not going to be the last we hear from Meta's specific case.

Article Link

big pauper · Feb 21, 2025

wtfNeedSignUp said:
I really fucking hope modern AI isn't being trained with smut for women.

llama definitely is and that's meta's large language model

Blewberry Nausea · Feb 21, 2025

Their defense is it’s legal if you just download copyrighted material? I can’t see that ever backfiring! I’m sure the entertainment industry loves that angle.

Bland Crumbs · Feb 21, 2025

This was the first thing that came to mind for some reason:

Breadbassket · Feb 21, 2025

Meta claimed in a court filing this week that despite torrenting an 82 TB dataset of pirated, copyrighted material from shadow libraries to train its LLaMA AI models, that employees "took precautions not to "seed" any downloaded files".

Though Meta claims that there is no evidence of seeding, Michael Clark, an executive at Meta in charge of project management testified that the configuration settings they were using were modified "so that the smallest amount of seeding possible could occur".

Archives of both legal documents are attached to this post.

Irregardless · Feb 21, 2025

Worst part about the whole thing, niggas didn't even have the decency to seed after...

A Cat in a Minefield · Feb 21, 2025

Irregardless said:
Worst part about the whole thing, niggas didn't even have the decency to seed after...

Yeah, those niggers should have at least done that. You always think they couldn't get worse...

lurkr · Feb 21, 2025

I know KF has a hateboner for copyright laws, but I sincerely hope that that Zucc gets fined for every single book here.

Tasty Tatty said:
So, the Ai only repeat what they read from books without adding any personal input.

It's like any college student, but faster.

Because no citations are even needed.

Irregardless said:
Worst part about the whole thing, niggas didn't even have the decency to seed after...

Having some META servers seeding would at least be some form of a "moral" compensation for how this company has socially engineered people to be more retarded. But they didn't even do that, which really tells you how much of a kike Zucc really is.

NoReturn · Feb 21, 2025

Irregardless said:
Worst part about the whole thing, niggas didn't even have the decency to seed after...

TheCuntler said:
Yeah, those niggers should have at least done that. You always think they couldn't get worse...

lurkr said:
Having some META servers seeding would at least be some form of a "moral" compensation for how this company has socially engineered people to be more retarded. But they didn't even do that, which really tells you how much of a kike Zucc really is.

It's beyond parody. Absolutely incredible.
:story:

General Emílio Médici · Feb 21, 2025

I hope Meta gets fucking bombarded with a legal nuke

Irregardless · Feb 21, 2025

lurkr said:
Having some META servers seeding would at least be some form of a "moral" compensation for how this company has socially engineered people to be more retarded. But they didn't even do that, which really tells you how much of a kike Zucc really is.

NoReturn said:
It's beyond parody. Absolutely incredible.

So there is a bit more irony to it than just "Meta Jew'd out and didn't seed from their servers" as the actual torrenting itself was done manually via individual engineers torrenting directly to their local machines, where they specifically used VPN's because of how insane it would be if there were Meta owned corporate IP's being shown doing this. So basically, the decision not to seed after torrenting was made individually by each engineer, even though they were hiding their corporate IP's. So it wasn't just "nameless middle manager ran everything through a big server" it was a bunch of assholes who were totally aware of how much of a piece of shit they were being lmfao.

sassblassted · Feb 21, 2025

Not only did they torrent terabytes of data with probably the best bandwidth in the entire world, the fuckers actually went out of their way to not seed.

LifeAlert · Feb 21, 2025

Rules for thee, but not for me

Male Idiot · Feb 21, 2025

The only crime was not seeding. Imagine Tay 2,0 trained on autistic manifestos, 50 shades of gray and japanese light novels plus self help guidebooks.

Hilarious.

Jarl Varg · Feb 21, 2025

LifeAlert said:
Rules for thee, but not for me
View attachment 7009303

Beat me to it by 11 minutes. :lol:

Vecr · Feb 21, 2025

Male Idiot said:
Imagine Tay 2,0 trained on autistic manifestos, 50 shades of gray and japanese light novels plus self help guidebooks.

I think that's just Eliezer Yudkowsky.

AnOminous · Feb 21, 2025

Xarpho's Return said:
Now watch as the book publishers do nothing.

They're not going to do nothing. This could actually be the biggest copyright infringement suit of all time, on thousands, maybe tens of thousands, maybe more books. They were obtained illicitly for the purpose of doing something to earn a profit in the billions.

Let's say it's 10,000 infringements but they're only asking for statutory damages. That's $1,500,000,000. You'd probably have a whole cartel of publishers diving in to share the booty.

But they could probably also or instead go for disbursement of all improper profits. That could be in the billions.

They'd be insane to leave that money on the floor.

You can make fair use of material you gained legally, such as by purchasing it, or using public domain material. But when you do it by pirating it, and you knew you were pirating it, because Cuckerberg just moronically gets told hey dude, that's illegal, and he says fuck it, do it anyway, you move it into the highest tier of infringement, and perhaps even the criminal realm.

Breadbassket said:
Here is an update to this story.

It's a dumb argument. While illicit access to the material in question doesn't completely eliminate a fair use defense, since a lot of news stories are based on leaks, it vitiates it, because you've made an infringing copy of every single work in huge volume.

You're commercially exploiting someone else's work you didn't even access legitimately, because you pirated it and you knew you were pirating it, and deliberately concealed the activity by not seeding (fucking leeching scum), and that's way more obviously the motive than that they actually didn't want to infringe. What they didn't want is to get CAUGHT.

Blewberry Nausea · Feb 21, 2025

AnOminous said:
deliberately concealed the activity by not seeding

But even their own dude admitted they DID seed, they just throttled the uploads as much as possible. So their argument is defeated by their own employee!

New Meta Emails Reveal That the Company Downloaded 81.7 TB of Copyrighted Books via BitTorrent to Train Its AI Models - Tech giants have been continuously extracting data from the Internet to develop their models.

wtfNeedSignUp

Jimjamflimflam

Breadbassket

Meta defends using pirated material, claims it's legal if you don't seed content

Authors allege Meta was "knowing participant" in "illegal peer-to-peer piracy network"

big pauper

Blewberry Nausea

I have not come to bring peace, but a sword.

Bland Crumbs

You didn't. You don't.

Breadbassket

Attachments

Irregardless

A Cat in a Minefield

Gracefully avoiding death!

lurkr

NoReturn

Please read all posts in the voice of Neco-Arc

General Emílio Médici

Architect of the Brazilian Miracle

Irregardless

sassblassted

LifeAlert

foodjack

Male Idiot

Das rite!

Jarl Varg

Vecr

DM if I don't respond.

AnOminous

SOMEBODY SET UP US THE BOMB

Blewberry Nausea

I have not come to bring peace, but a sword.

New Meta Emails Reveal That the Company Downloaded 81.7 TB of Copyrighted Books via BitTorrent to Train Its AI Models - Tech giants have been continuously extracting data from the Internet to develop their models.

Meta defends using pirated material, claims it's legal if you don't seed content​

Authors allege Meta was "knowing participant" in "illegal peer-to-peer piracy network"​

I have not come to bring peace, but a sword.

You didn't. You don't.

Attachments

Gracefully avoiding death!

Please read all posts in the voice of Neco-Arc

Architect of the Brazilian Miracle

foodjack

Das rite!

DM if I don't respond.

SOMEBODY SET UP US THE BOMB

I have not come to bring peace, but a sword.

Meta defends using pirated material, claims it's legal if you don't seed content

Authors allege Meta was "knowing participant" in "illegal peer-to-peer piracy network"