Schizo take, I legit suspect AA to be an corpo operation now. It was created 2022, post initial AI hype and when most shit was scraped. A really good sorting system (ISBN, book type etc). It's just a too convenient source to get ebooks, many even OCR scanned or easy to feed in a scanner, then just start training the model. All this without dealing with a bureaucratic copyright licence for with the single major publisher. 30 companies just happend to use it? No, I think they created it. The LLM section on AA is essentially a service offer.
If we see an next gen audio models within the next months, it just has to be some type of a corpo op. Such an odd thing to start archiving all the sudden.
Ars Technica:
Anna’s Archive loses .org domain, says suspension likely unrelated to Spotify piracy (
archive)
Your take is shared, in part, by some of the Ars commenters in the story above. Some of the anti-AI spergs are taking notice:



Piracy IS preservation, faggot.
The 30 companies are supposedly mostly Chinese. I believe the AA people have correctly identified that they can get some significant benefits for their mission by partnering with the people who care the least about intellectual property. And some of the fruits of that partnership are seen in the
recent massive expansion of Chinese content. Data/metadata is just as valuable to AA as funding:
https://annas-archive.se/llm (
archive) (
mega)
This is enterprise-level access that we can provide for donations in the range of tens of thousands USD. We’re also willing to trade this for high-quality collections that we don’t have yet.
We can refund you if you’re able to provide us with enrichment of our data, such as:
- OCR
- Removing overlap (deduplication)
- Text and metadata extraction
Rather than AA being a "corpo op", I think they are sincere about their mission, and their goals happen to align well with AI/LLM companies hoovering up as much data as possible:
https://annas-archive.se/faq (
archive) (
mega)
Anna’s Archive is a non-profit project with two goals:
- Preservation: Backing up all knowledge and culture of humanity.
- Access: Making this knowledge and culture available to anyone in the world.
They are openly stating as much on the blog:
Anna's Archive Blog (January 31, 2025):
Copyright reform is necessary for national security (
archive) (
ghost) (
mega)
My team and I are ideologues. We believe that preserving and hosting these files is morally right. Libraries around the world are seeing funding cuts, and we can’t trust humanity’s heritage to corporations either.
Then came AI. Virtually all major companies building LLMs contacted us to train on our data. Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality. This is notable given China’s role as a signatory to nearly all major international copyright treaties.
Zuckbook has been caught with their hands in the cookie jar, but is just a leech, offering no benefits:
TorrentFreak:
‘Meta Torrented over 81 TB of Data Through Anna’s Archive, Despite Few Seeders’ (
archive) (
mega)
It should go without saying that whatever the true purpose of AA is, I support Total Copyright Death, so it doesn't matter to me much. Chinese LLM companies have also been providing some good AI models that can be run locally. The rising tide lifts all boats.
Here's a Hacker News discussion for the TorrentFreak .org suspension article:
https://news.ycombinator.com/item?id=46497164 (
archive) (
mega)
Noticed this tard guide in search results:
https://annasarchive.info/ (
archive) (
mega)