New Meta Emails Reveal That the Company Downloaded 81.7 TB of Copyrighted Books via BitTorrent to Train Its AI Models - Tech giants have been continuously extracting data from the Internet to develop their models.

Look--an Alien! · Feb 7, 2025

Rules for thee and not for me, as usual

Blewberry Nausea · Feb 7, 2025

How much of it was porn? Wait does that mean they trained their AI by making it read Patrick S. Tomlinson’s books?

Breadbassket · Feb 7, 2025

Breadbassket said:
Unsealed emails. Appendix A of the case includes several emails from Meta employees that reveal a significant number of downloads of copyrighted books. One employee named Melanie Kambadur expressed her refusal to participate in this kind of data collection in October 2022.

Breadbassket said:
Covering its tracks. In an internal message, Meta researcher Frank Zhang said that the company took measures to avoid using its servers when downloading the data set. This was to prevent anyone from being able to trace the seeding and the entity downloading the content.

Breadbassket said:
81.7 TB of data. According to Ars Technica, new evidence indicates that Meta downloaded at least 81.7 TB of data from several libraries that offered copyrighted books via torrents. A recent document from the ongoing legal process revealed that at least 35.7 TB were downloaded from sites like Z-Library or LibGen (which was shut down in the summer).

The referenced legal documents are archived at the bottom of this post.

TowinKarz · Feb 7, 2025

"We said we wouldn't be evil. Well, now we say, evil shouldn't have been so easy or profitable!"

Wodanaz · Feb 7, 2025

HOLY FUCKING BASED
WTF I LOVE ZUCKERBERG NOW????????

jray4559 · Feb 8, 2025

Download all that data to still get a model that is lobotomized to all hell and pretty much can't tell a good story to save its life. I love generative AI software (as toys) but they are shitty toys that you get bored with after a week they way they are and have been for the past two years. If the courts declare this method of scraping illegal, what happens then?

Xarpho's Return · Feb 8, 2025

Look--an Alien! said:
Rules for thee and not for me, as usual

Now watch as the book publishers do nothing.

Blewberry Nausea · Feb 8, 2025

Oh yeah FYI apparently they set the downloads to the lowest limit of seeding allowed too, fucking leeches.

whatever I feel like · Feb 8, 2025

So its the return of old Facebook?

Good ol' lie to your face Facebook, everybody's friend...

The Mass Shooter Ron Soye · Feb 8, 2025

jray4559 said:
If the courts declare this method of scraping illegal, what happens then?

They retrain on old ass books and whatever else they can scrounge up that's in the public domain, or license content (undesirable unless they negotiate favorable terms). Meta surely uses other content like whatever brainfarts are posted on Facebook.

Bloodfeast Island Man · Feb 8, 2025

That's alot of fucking books. The few books I've downloaded are only a couple megabytes each.

Hitman One · Feb 8, 2025

Bloodfeast Island Man said:
That's alot of fucking books. The few books I've downloaded are only a couple megabytes each.

Somewhere between 43 and 87 billion.

Blewberry Nausea · Feb 8, 2025

Bloodfeast Island Man said:
That's alot of fucking books. The few books I've downloaded are only a couple megabytes each.

Not all books are competently scanned for size and many have high quality images. I’m sure there are plenty of giant ass college book pdfs.

Dagoth AMOGUS · Feb 8, 2025

if I recall correctly this site isn’t even 1tb worth of text

wtf

Vesperus · Feb 8, 2025

Dagoth AMOGUS said:
if I recall correctly this site isn’t even 1tb worth of text

wtf

We should set up our own AI trained solely on farms data. It would be amazing.

Mike Matei's Penis · Feb 8, 2025

How many fucking viruses do you get from that many torrents...

Tasty Tatty · Feb 8, 2025

So, the Ai only repeat what they read from books without adding any personal input.

It's like any college student, but faster.

Quijibo69 · Feb 8, 2025

jray4559 said:
Download all that data to still get a model that is lobotomized to all hell and pretty much can't tell a good story to save its life. I love generative AI software (as toys) but they are shitty toys that you get bored with after a week they way they are and have been for the past two years. If the courts declare this method of scraping illegal, what happens then?

AI will be like a fad like 3D and VR are. Gets popular for a few years when technology improves then nobody gives a gives a shit for a few years because it's not practical. Rinse and repeat.

Snekposter · Feb 8, 2025

Vesperus said:
We should set up our own AI trained solely on farms data. It would be amazing.

Tay lives!

New Meta Emails Reveal That the Company Downloaded 81.7 TB of Copyrighted Books via BitTorrent to Train Its AI Models - Tech giants have been continuously extracting data from the Internet to develop their models.

Look--an Alien!

Just doing what I can with what I got

Blewberry Nausea

I have not come to bring peace, but a sword.

Breadbassket

Attachments

TowinKarz

You'll Soon Get Your Thumbuppance.

Wodanaz

The Wanderer

jray4559

Xarpho's Return

Spilled milk

Blewberry Nausea

I have not come to bring peace, but a sword.

whatever I feel like

Mushroom Kingdom Uber Alles!

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

Bloodfeast Island Man

If I gave you a rainbow, you're a huge faggot.

Hitman One

A Few Hundred Bullets Back

Blewberry Nausea

I have not come to bring peace, but a sword.

Dagoth AMOGUS

Vesperus

Caedite eos. Novit enim Dominus qui sunt eius.

Mike Matei's Penis

We're the dick on the balls.

Tasty Tatty

Alpaca Expert

Quijibo69

80 dollars for trash

Snekposter

Year of the Snek