archive.is is being seized by the FBI.

so is it gone for good or will another person take its place and continue the effort, i knew breach forums has a history of doing this until the owner got omega-railed, idk i dont really keep up with online stuff as much as i used to lol
 
I noticed it being down all day.

This is a tragedy for online archivism. Not just the alt web anymore, as half the regular web also runs off archive.today

So much lolcow content will be lost.
so many paywalls that i will have to bypass some other way.

truly a dark time is upon us .
 
Last edited:
Tucows is used by https://njal.la, which is the main service used by pirate sites like https://1337x.to and https://annas-archive.org. They act as a proxy between registrars (i.e. Tucows) and registrants (i.e. you/archive.is' owner), adding a degree of privacy and anonymity since domain registrations normally require you to dox yourself to your registrar. However, there are some problems with njalla, as detailed in Josh's tier list.

I mention this because I really wonder if the glowies are going to start going after piracy sites in a similar fashion. You can even see 1337x link to njalla to promote them:
1762461902394.png

archive.is has an onion domain that may remain up if the clearnet domains don't: http://archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion/ (pulled from their Onion-Location header)
 
Of course they're going after the one other archival site we have alongside kneecapping the one they support directly (Internet Archive), even if that's pozzed as well. Sasuga, Patel.
 
To the chagrin of the media industry, the service is also often used to bypass paywalls. This is also possible because the service does not adhere to common rules and laws and offers no opt-out option.
probably what it is. suing on the behalf of the jew york times

also idk why people keep saying its down, it works fine for me.
Screenshot 2025-11-06 155117.png
 
I'll give the jeet the benefit of the doubt one last time and assume he wants access to customer data for a good reason. Probably wise to start re-archiving your favorite cow's stuff through Ghost Archive tho
 
Is there a reason why they are doing this?
The article only says that it might have something to do with copyright issues or shady funding or them possibly being Russian.
1762461794500.png
the "in a post" link goes to here

archive.today: On the trail of the mysterious guerrilla archivist of the Internet​

Do you like reading articles in publications like Bloomberg, the Wall Street Journal or the Economist, but can’t afford to pay what can be hundreds of dollars a year in subscriptions? If so, odds are you’ve already stumbled on archive.today, which provides easy access to these and much more: just paste in the article link, and you’ll get back a snapshot of the page, full content included.

1762461978896.png

For a long time, I assumed that this was some kind of third-party skin on top of the venerable Internet Archive, whose Wayback Machine provides a very similar service at the very similar address of archive.org. However, the Wayback Machine is slow, clunky, frequently errors out, and most importantly, it’s very easy for websites to opt out, retroactively erasing all their content forever. In contrast, archive.today has no opt-outs or erase buttons: like it or not, they store everything and it’s not going anywhere, with some limited exceptions for law enforcement, child porn, etc.

The Internet Archive is a legitimate 501(c)(3) non-profit with a budget of $37 million and 169 full-time employees in 2019. archive.today, by contrast, is an opaque mystery. So who runs this and where did they come from?

The origins and owners of archive.today​

The first historical record we have of the site dates from May 16, 2012, when a “Denis Petrov” from Prague, Czech Republic registered the domain archive.is, the original name of the site. archive.today followed in 2014, and the site has since registered countless variations: archive.li, archive.ec, archive.vn, archive.ph, archive.fo, etc. Denis Petrov is a common Russian name, with pages and pages of matches on LinkedIn, but it may well be an alias: informer.com notes that the same contact information was used to register a series of very sketchy domains, ranging from “carding forum” verified.lu to piracy sites btdlg.com and moviesave.us (all long since gone), many seeded with German keywords (spiel, gewinnt, online).

Domains aside, “Denis Petrov” has little presence on the web, and three seemingly connected domains proved dead ends. The obvious denispetrov.com was an entertaining rabbit hole, with the author an accomplished programmer with an interest in Web automation, but it’s clearly the work of a New Yorker, they’re blogging at the tail end of a 25-year career and the blog dries up entirely in 2011, so it doesn’t match the place or time. denis.biz (2001) and petrov.net (1998!) contain nothing. The one intriguing bit of evidence we have is this series of screenshots (archive) where Brave’s tech support addresses webmaster@archive.is as “Denis”, but odds are that’s just from the same DNS record.

We can glean a few more clues from archive.today‘s web presence. The FAQ, unchanged since 2013 (!), states that they are located in Europe and asks for PayPal donations in euros. Looking through the voluminous Tumblr blog, featuring tons of questions but very terse answers, the author’s English is excellent but not quite native, with occasional Noun Capitalization also hinting at a German background. Yet they answer questions in Russian, and the site uses a Russian analytics engine.

1762461948719.png

The most interesting detective work to date comes from Stack Exchange, where Ciro Santilli managed to link the profile picture of an account archive.today once used to archive LinkedIn content to a “Masha Rabinovich” in Berlin. Even more intriguingly, in a 2012 F-Secure forum post, a “masharabinovich” complains about “my website http://archive.is/” being blacklisted. They pop up on Wikipedia as well getting told off for adding too many links to archive.is, including a mention that they’re using the Czech ISP fiber.cz, and their early edit history includes many updates to the pages “Russian passport” and “Belarusian passport”. “Masha” (Маша) is a common Russian diminutive of Maria, although it can also be a Hebrew form of Moses (מַשה), and Rabinovich is an Ashkenazi Jewish surname.

Early Github captures on archive.today are linked to a now completely disappeared account called “volth” (copy archived by archive.today itself), who was a fluent speaker of Russian, contributed extensively to NixOS (which archive.today uses) and has a profile picture not dissimilar to Masha’s. The linked volth.com domain is now only an empty husk, but it dates back to 2004, with early versions first doing some kind of sketchy search engine network marketing thing (2005), promising “Total Success in Internet” (2008) and eventually being put up for sale (2010), making it likely that its original owners the Espinosas are unrelated to whoever owns the domain today.

While we may not have a face and a name, at this point we have a pretty good idea of how the site is run: it’s a one-person labor of love, operated by a Russian of considerable talent and access to Europe. Let’s move on to the nitty gritty.

Infrastructure​

There are two components to any archival site: the scraper that copies the pages, and the storage system where the pages are kept and retrieved on demand. Helpfully, the FAQ shares some details of what the storage side at least used to look like:

The archive runs Apache Hadoop and Apache Accumulo. All data is stored on HDFS, textual content is duplicated 3 times among servers in 2 datacenters and images are duplicated 2 times. Both datacenters are in Europe, with OVH hosting at least one of them.

1762461926389.png

In 2012, the site already had 10 TB of archives and cost ~300 euros/mo to run, escalating to 2000 euros by 2014 and $4000 by 2016. As of 2021, they have archived on the order of 500 million pages, and with the average size of a webpage clocking in at well over 2 MB these days, that’s a cool 1,000 TB to deal with. (For comparison, the Internet Archive is around 40,000 TB.)

The less discussed but more controversial half of the site is scraping, the process of vacuuming up live webpages. Since 2021, this uses a modified version of the Chrome browser, and the blog readily admits that the availability of computing power to run these automated browsers is now the main bottleneck to expanding the site. To avoid detection, archive.today runs via a botnet that cycles through countless IP addresses, making it quite difficult for grumpy webmasters to stop their sites getting scraped. Access to paywalled sites is through logins secured via unclear means, which need to be replenished constantly: here’s the creator asking for Instagram credentials.

Finally, the serving of the website is also subject to a perpetual game of cat and mouse: “I can only predict that there will be approximately one trouble with domains per year and each fifth trouble will result in domain loss.” As of today, archive.today still works, but users are redirected to archive.md.

Funding​

The other major source of permanent uncertainty is the site’s funding model. We’ve established that its costs are considerable, but according to the creator, as of 2021 ads and donations covered less than 20% of expenses, with donations on the order of 6000 euros. PayPal donations, previously accepted, were switched off around 2022 since the creator could no longer top up the account, implying they’re in Russia, and they complain about the difficulty of doing cross-border payments “across the Iron Curtain”. Donations these days are via Liberapay, an obscure French non-profit organization, and YC-backed startup BuyMeACoffee. Surprisingly, the creator has a healthy skepticism of crypto, so this remains unsupported.

The other source of income is ads. The FAQ, far out of date, has a “promise it will have no ads at least till the end of 2014“, but there have long been Yahoo network ads injected on top of pages when you use mobile (but, oddly, not on desktop). Revenue is even more of a question mark, but apparently on good days they “almost cover expenses” (a remark that doesn’t quite square with the other comment about ads and donations together covering less than 20%), while on bad days they’re getting kicked out from serving ads because an archive of the Internet will inevitably archive advertiser-unfriendly NSFW content too.

Archive.today, not tomorrow?​

So there we have it: the site is a one-man battle against entropy, constantly battling domain registrars, anti-scraping systems, copyright enforcement, easily spooked advertisers, and global financial system payment rails designed to obstruct Russian citizens. By staying anonymous and keeping a low profile, they’ve (likely?) managed to avoid the kind of legal tussles that have embroiled Alexandra Elbakyan of Sci-Hub fame, but they’ve still funded it to the tune of tens of thousands of euros during that time. They clearly have a second source of considerable income that’s likely somewhat sketchy as well, so if that ever goes away, archive.today is likely to go away with it.

The creator is fully aware that the site is a mere “weak tool” that is “doomed to die“, but the bus factor of one combined with its semi-legal nature means there can be no real continuity: there will never be a legally incorporated Archive.Today Foundation to carry on his work. It’s a testament to their persistence that they’re managed to keep this up for over 10 years, and I for one will be buying Denis/Masha/whoever a well deserved cup of coffee.

All images in this post feature the Bibliotheca Alexandrina at Alexandria, Egypt.
At this point in time I firmly believe that it's over copyright and this is the only way copyright holders can get their coveted IP fully in their care so they can charge for it.
 
Last edited:
Tucows is used by https://njal.la, which is the main service used by pirate sites like https://1337x.to and https://annas-archive.org. They act as a proxy between registrars (i.e. Tucows) and registrants (i.e. you/archive.is' owner), adding a degree of privacy and anonymity since domain registrations normally require you to dox yourself to your registrar. However, there are some problems with njalla, as detailed in Josh's tier list.

I mention this because I really wonder if the glowies are going to start going after piracy sites in a similar fashion. You can even see 1337x link to njalla to promote them:
View attachment 8135255
it wont come as a surprise with companies pushing out low quality tv shows and movies that they would move their focus to even heavier DRM and stricter DMCA enforcement, why improve when you can just squeeze the status quo? That and news companies still need a strangle hold on legacy media articles to control what is said and what's published.
 
I mention this because I really wonder if the glowies are going to start going after piracy sites in a similar fashion. You can even see 1337x link to njalla to promote them:
there might be some meat to this theory. The feds in Argentina have launched several raids agaisn't piracy sites, some of the biggest in latin america even.

Maybe it's some international agreement to try and shut down piracy websites?
 
I want to remind the doomers that Ghost Archive exists, however who knows how long it'll last if they're actually trying to take down Archive Today.

We need an actually good way to download websites. Chromium browsers have this already, but are pretty flawed compared to the main 3 archive sites.
 
Pretty sure that Ghostarchive is down too, don't know about other archival platforms.

This whole thing glows hard.
Been wondering why it hasn't worked for me lately. Entire thing reeks of revision; expect some new truth being pushed contrary to something you know was documented somewhere. Kashapp Paypal's handlers want something gone.
 
Well, time to get paranoid and waste my entire night jackhammering old 8chan domains to archive the archived pages before a ton of old history gets wiped because a bunch of shitty dying newspaper empires want their ad revenue.
 
Maybe I'm oversimplifying things here but... It seems in the past few years glowies have been ramping up their DNS shut-it-downs so...why even use DNS? Just get SSL on your IP and share the IP to interested communities. Yeah it will be a little harder for normies to deal with but who cares at this point?
 
Back
Top Bottom