Business Reddit will block the Internet Archive - The company says that AI companies have scraped data from the Wayback Machine, so it’s going to limit what the Wayback Machine can access.

  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
reddit.webp

Reddit says that it has caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to start blocking the Internet Archive from indexing the vast majority of Reddit. The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles; instead, it will only be able to index the Reddit.com homepage, which effectively means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day.
”Internet Archive provides a service to the open web, but we’ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine,” spokesperson Tim Rathschmidt tells The Verge.

The Internet Archive’s mission is to keep a digital archive of websites on the internet and “other cultural artifacts,” and the Wayback Machine is a tool you can use to look at pages as they appeared on certain dates, but Reddit believes not all of its content should be archived that way.“Until they’re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we’re limiting some of their access to Reddit data to protect redditors,” Rathschmidt says.
The limits will start “ramping up” today, and Reddit says it reached out to the Internet Archive “in advance” to “inform them of the limits before they go into effect,” according to Rathschmidt. He says Reddit has also “raised concerns” about the ability of people to scrape content from the Internet Archive in the past.
Reddit has a recent history of cutting off access to scraper tools as AI companies have begun to use (and abuse) them en masse, but it’s willing to provide that data if companies pay. Last year, Reddit struck a deal with Google for both Google Search and AI training data early last year, and a few months later, it started blocking major search engines from crawling its data unless they pay. It also said its infamous API changes from 2023, which forced some third-party apps to shut down, leading to protests, were because those APIs were abused to train AI models.

Reddit also struck an AI deal with OpenAI, but it sued Anthropic in June, claiming Anthropic was still scraping from Reddit even after Anthropic said it wasn’t scraping anymore.
“We have a longstanding relationship with Reddit and continue to have ongoing discussions about this matter,” Mark Graham, director of the Wayback Machine, says in a statement to The Verge.

(Link/Archive)
 
I guess the staff members of Reddit will get some Pikachu surprised face once they'll discover then people still menage to archive their stuff on Archive.today and GhostArchive.
 
That's why they want exclusive ownership of it. They're well aware of how everyone started putting "reddit" on the end of searches a few years ago because of how useless Google had gotten. They're sitting on a goldmine of training data
I put -reddit on all my google searches because the inevitable reddit shit that comes up is:
1. always fucking wrong but in a snide dipshit way with no sources
2. always deleted comments or dead threads
3. you can't access the site with a heckin VPNerino anyway
4. it's reddit
 
I put -reddit on all my google searches because the inevitable reddit shit that comes up is:
1. always fucking wrong but in a snide dipshit way with no sources
2. always deleted comments or dead threads
3. you can't access the site with a heckin VPNerino anyway
4. it's reddit
Okay, have a great time
 
Why won't the Internet just fucking die already?
This whole "slow death of a near-corpse rotting from the inside out" shtick is so played out.

Can they at least give us a new narrative? Or is it 6 more years of this shit before the dead Internet psyop stops being a meme and actually happens?
 
Back
Top Bottom