Culture archive.today: On the trail of the mysterious guerrilla archivist of the Internet - It's not doxxing when we do it, it's just an expose

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Article Link (Archive)

Do you like reading articles in publications like Bloomberg, the Wall Street Journal or the Economist, but can’t afford to pay what can be hundreds of dollars a year in subscriptions? If so, odds are you’ve already stumbled on archive.today, which provides easy access to these and much more: just paste in the article link, and you’ll get back a snapshot of the page, full content included.


For a long time, I assumed that this was some kind of third-party skin on top of the venerable Internet Archive, whose Wayback Machine provides a very similar service at the very similar address of archive.org. However, the Wayback Machine is slow, clunky, frequently errors out, and most importantly, it’s very easy for websites to opt out, retroactively erasing all their content forever. In contrast, archive.today has no opt-outs or erase buttons: like it or not, they store everything and it’s not going anywhere, with some limited exceptions for law enforcement, child porn, etc.
The Internet Archive is a legitimate 501(c)(3) non-profit with a budget of $37 million and 169 full-time employees in 2019. archive.today, by contrast, is an opaque mystery. So who runs this and where did they come from?

The origins and owners of archive.today​

The first historical record we have of the site dates from May 16, 2012, when a “Denis Petrov” from Prague, Czech Republic registered the domain archive.is, the original name of the site. archive.today followed in 2014, and the site has since registered countless variations: archive.li, archive.ec, archive.vn, archive.ph, archive.fo, etc. Denis Petrov is a common Russian name, with pages and pages of matches on LinkedIn, but it may well be an alias: informer.com notes that the same contact information was used to register a series of very sketchy domains, ranging from “carding forum” verified.lu to piracy sites btdlg.com and moviesave.us (all long since gone), many seeded with German keywords (spiel, gewinnt, online).
Domains aside, “Denis Petrov” has little presence on the web, and three seemingly connected domains proved dead ends. The obvious denispetrov.com was an entertaining rabbit hole, with the author an accomplished programmer with an interest in Web automation, but it’s clearly the work of a New Yorker, they’re blogging at the tail end of a 25-year career and the blog dries up entirely in 2011, so it doesn’t match the place or time. denis.biz (2001) and petrov.net (1998!) contain nothing. The one intriguing bit of evidence we have is this series of screenshots (archive) where Brave’s tech support addresses webmaster@archive.is as “Denis”, but odds are that’s just from the same DNS record.
We can glean a few more clues from archive.today‘s web presence. The FAQ, unchanged since 2013 (!), states that they are located in Europe and asks for PayPal donations in euros. Looking through the voluminous Tumblr blog, featuring tons of questions but very terse answers, the author’s English is excellent but not quite native, with occasional Noun Capitalization also hinting at a German background. Yet they answer questions in Russian, and the site uses a Russian analytics engine.


The most interesting detective work to date comes from Stack Exchange, where Ciro Santilli managed to link the profile picture of an account archive.today once used to archive LinkedIn content to a “Masha Rabinovich” in Berlin. Even more intriguingly, in a 2012 F-Secure forum post, a “masharabinovich” complains about “my website http://archive.is/” being blacklisted. They pop up on Wikipedia as well getting told off for adding too many links to archive.is, including a mention that they’re using the Czech ISP fiber.cz, and their early edit history includes many updates to the pages “Russian passport” and “Belarusian passport”. “Masha” (Маша) is a common Russian diminutive of Maria, although it can also be a Hebrew form of Moses (מַשה), and Rabinovich is an Ashkenazi Jewish surname.
Early Github captures on archive.today are linked to a now completely disappeared account called “volth” (copy archived by archive.today itself), who was a fluent speaker of Russian, contributed extensively to NixOS (which archive.today uses) and has a profile picture not dissimilar to Masha’s. The linked volth.com domain is now only an empty husk, but it dates back to 2004, with early versions first doing some kind of sketchy search engine network marketing thing (2005), promising “Total Success in Internet” (2008) and eventually being put up for sale (2010), making it likely that its original owners the Espinosas are unrelated to whoever owns the domain today.
While we may not have a face and a name, at this point we have a pretty good idea of how the site is run: it’s a one-person labor of love, operated by a Russian of considerable talent and access to Europe. Let’s move on to the nitty gritty.

Infrastructure​

There are two components to any archival site: the scraper that copies the pages, and the storage system where the pages are kept and retrieved on demand. Helpfully, the FAQ shares some details of what the storage side at least used to look like:
The archive runs Apache Hadoop and Apache Accumulo. All data is stored on HDFS, textual content is duplicated 3 times among servers in 2 datacenters and images are duplicated 2 times. Both datacenters are in Europe, with OVH hosting at least one of them.

In 2012, the site already had 10 TB of archives and cost ~300 euros/mo to run, escalating to 2000 euros by 2014 and $4000 by 2016. As of 2021, they have archived on the order of 500 million pages, and with the average size of a webpage clocking in at well over 2 MB these days, that’s a cool 1,000 TB to deal with. (For comparison, the Internet Archive is around 40,000 TB.)
The less discussed but more controversial half of the site is scraping, the process of vacuuming up live webpages. Since 2021, this uses a modified version of the Chrome browser, and the blog readily admits that the availability of computing power to run these automated browsers is now the main bottleneck to expanding the site. To avoid detection, archive.today runs via a botnet that cycles through countless IP addresses, making it quite difficult for grumpy webmasters to stop their sites getting scraped. Access to paywalled sites is through logins secured via unclear means, which need to be replenished constantly: here’s the creator asking for Instagram credentials.
Finally, the serving of the website is also subject to a perpetual game of cat and mouse: “I can only predict that there will be approximately one trouble with domains per year and each fifth trouble will result in domain loss.” As of today, archive.today still works, but users are redirected to archive.md.

Funding​

The other major source of permanent uncertainty is the site’s funding model. We’ve established that its costs are considerable, but according to the creator, as of 2021 ads and donations covered less than 20% of expenses, with donations on the order of 6000 euros. PayPal donations, previously accepted, were switched off around 2022 since the creator could no longer top up the account, implying they’re in Russia, and they complain about the difficulty of doing cross-border payments “across the Iron Curtain”. Donations these days are via Liberapay, an obscure French non-profit organization, and YC-backed startup BuyMeACoffee. Surprisingly, the creator has a healthy skepticism of crypto, so this remains unsupported.
The other source of income is ads. The FAQ, far out of date, has a “promise it will have no ads at least till the end of 2014“, but there have long been Yahoo network ads injected on top of pages when you use mobile (but, oddly, not on desktop). Revenue is even more of a question mark, but apparently on good days they “almost cover expenses” (a remark that doesn’t quite square with the other comment about ads and donations together covering less than 20%), while on bad days they’re getting kicked out from serving ads because an archive of the Internet will inevitably archive advertiser-unfriendly NSFW content too.

Archive.today, not tomorrow?​

So there we have it: the site is a one-man battle against entropy, constantly battling domain registrars, anti-scraping systems, copyright enforcement, easily spooked advertisers, and global financial system payment rails designed to obstruct Russian citizens. By staying anonymous and keeping a low profile, they’ve (likely?) managed to avoid the kind of legal tussles that have embroiled Alexandra Elbakyan of Sci-Hub fame, but they’ve still funded it to the tune of tens of thousands of euros during that time. They clearly have a second source of considerable income that’s likely somewhat sketchy as well, so if that ever goes away, archive.today is likely to go away with it.
The creator is fully aware that the site is a mere “weak tool” that is “doomed to die“, but the bus factor of one combined with its semi-legal nature means there can be no real continuity: there will never be a legally incorporated Archive.Today Foundation to carry on his work. It’s a testament to their persistence that they’re managed to keep this up for over 10 years, and I for one will be buying Denis/Masha/whoever a well deserved cup of coffee.

TL;DR - Some google employee decided to do babby's first OSINT journalism and in the process tried to dox the site operator. Archive.today responded by no longer allowing people to access the site besides through modern web browsers in an attempt to deter bots and potential server overload.
Links2 - doesn't work
Python urllib with/without headers - doesn't work
Outdated webview browsers on early versions of android - doesn't work
 
Well, that escalated quickly

Oh the guy attacking the site that helps Kiwifarms archive actions of pedos and the like is a pedo?

5f9.jpg
 
More from the author: https://driftingclouds.net/2008/05/06/beer-bacon-and-bargirls-a-multimodal-escape-to-bahrain/

(archive)

I'm going to dig through here to see if I can find something - if someone would like to help, that's great.

I found it through here: https://wikitravel.org/en/Special:MobileDiff/1127259

Edit: he notices children and poop a lot. He also wants to eat dog dicks. He was called an otaku by Japanese innkeepers. He threw a shoe out of a window at a Japanese communist.
LMAO, yes, nothing out of the normal - just an infant bleeding to death.
1691534663283.png
(link) (archive)

More gems:
1691534844839.png
(from the same link as above)

1691535245917.png
(link)(archive) - sperging about his lack of ethics

1691535323519.png
same as above - he's now talking about little kids pooping

1691535529582.png
(link)(archive) - he's talking about the plight of kids while exploiting them by using their image across the internet

1691535974614.png
1691536040612.png
1691536168729.png
(link)(archive) - this is fucking gross - he's eating chicken ovaries and wants to eat dog dicks for Christ's sake

1691536347692.png
(link)(archive) - ... that's oddly specific

1691536569407.png
(https://driftingclouds.net/2007/04/)(archive) - he fantasizes about Catholic schoolgirl uniforms on (presumably) Latina girls

1691536828551.png
same as above - why is he fixated on children with poop on them?

1691537170378.png
(link)(archive)

1691537283061.png
same as above - yeah, he's a sex tourist

1691537536831.png
same as above - LMAO - he got outed by Japanese innkeepers as a creep

1691537677887.png
same as above - he throws shoes at people

1691537754571.png
same as above - he wants to keep people in boxes - what a shock
 
Last edited:
Would it be possible for you to archive any posts this guy had made on this forum?


I've archived this particular page. (https://archive.md/WVr69)

I'm pretty sure he's talking about sex paraphernalia in this one.

1691538183913.png
1691538455884.png

He's deleted a few posts on this forum. Gee, I wonder why?
1691538570866.png
 
Last edited:
It doesn't like my VPN and is also blocking TOR. Maybe someone else can get it.
Thanks for trying - I'm using an Australian IP address via my VPN and it's working.

I'm going to have to go through it piece by piece.


The most interesting part is that someone archived all this shit 9 years ago on archive.today.
 
Last edited:
Yeah, I'm completely sure this guy didn't take advantage of this girl.
1691540258655.png

(link) - this has already been linked - from the FlyerTalk forum megathread of his travel reports

He mentions a friend named N with whom he travels and watches as she applies nailpolish to her toes (his observations, not mine). He makes mention in that FlyerTalk thread about her being a student - likely a college student.

I've gotten through the FlyerTalk forum megathread of his travel reports - I don't think he has any other posts, but will double-check.
 
Last edited:
Great the Finnish pedo version of Roosh.
Not to split hairs, but I think he's originally Canadian and then went and stayed at a bunch of countries, including Australia and Finland.

He may have a wife and children:
1691541696118.png
https://driftingclouds.net/2020/07/ (already archived)

I would read this as other people's kids are on summer break, but he keeps mentioning "we" throughout the report.
 
Last edited:
  • Like
Reactions: Markass the Worst
Not to split hairs, but I think he's originally Canadian and then went and stayed at a bunch of countries, including Australia and Finland.

He may have a wife and children:
View attachment 5260037
https://driftingclouds.net/2020/07/ (already archived)

I would read this as other people's kids are on summer break, but he keeps mentioning "we" throughout the report.
That reads to me as his kids. They went to Singapore because they hadn't been out for months because lockdowns, his kids were now on summer holiday and he found a good deal.
 
  • Thunk-Provoking
Reactions: Markass the Worst
Screenshot_20230808_192252.jpeg
"Because every article has to talk about how to find prostitutes."

Screenshot_20230808_192404.jpeg
"I'm not the criminal, that slut whore is! Arrest HER!"

Screenshot_20230808_193253.jpeg
"If prostitution is legal, then everything is consensual. Those women talking about being kidnapped and held against their will in a rape den were just joking!"

Screenshot_20230808_193705.jpeg
Jesus Christ, and I'm sure the "girlfriend" who dragged him into a tranny bar was (a) not his girlfriend, (b) not a girl, (c) was just looking to get money from farang who wants to be love long time.

Screenshot_20230808_194405.jpeg
He's based against pajeets. I wonder how his Indian co-workers and managers reading this right now feel about this. If only India had more of those delectable ladyboys, I'm sure he'd see past all the open sewers and public urination.
 
He's active (right now!) on Wikivoyage: https://en.wikivoyage.org/wiki/User:Jpatokal

He was probably kicked off Wikitravel for wanting to talk about sex tourism. Of course he didn't learn his lesson:

Screenshot_20230808_195323.jpeg


Screenshot_20230808_195929.jpeg

But now we know about a prior accusation of doxxing at Wikitravel....

Screenshot_20230808_201230.jpeg

[Jpatokal] has recently been making anonymous IP edits to Wikitravel user talk pages (not my own). He has inserted links to the personal Facebook, twitter, and Linkedin social media accounts of at least one female Wikitraveler who he managed to connect to her Wikitravel username. This behavior is completely unscrupulous, and he's been blocked on Wikitravel.
He used the same IP address (14.201.15.71) he's used for years on Wikitravel to edit his own talk page, and also left evidence of his presence on that female Wikitraveler's Linkedin page.
This is not the first interwiki vandalism he has partaken of, but it is the most predatory to date. Be aware, he may deface Wikivoyage as well, and should probably be tracked on the languages he generally prowls: English, Japanese, Chinese, and Finnish. If he also has administrator privileges at Wikivoyage, you will want to address that. I can provide electronic evidence to any disciplinary investigators as required. Thank you, --[[User:IBobi|IBobi]] ([[User talk:IBobi|talk]]) 20:03, 30 November 2012 (UTC)
More on this here:
 
Last edited:
Anybody else notice Archive.is / Archive.Today quit working like a day or so after this article came out?

It's working for me now. Have you changed DNS hosts lately? If you're using CloudFlare DNS then the site won't load for some reason. You'll have find a different one. Or just switch out to google's when you need it. (googles is: 8.8.8.8 - 8.8.4.4)
 
  • Informative
Reactions: Ebonic Tutor
It's working for me now. Have you changed DNS hosts lately? If you're using CloudFlare DNS then the site won't load for some reason. You'll have find a different one. Or just switch out to google's when you need it. (googles is: 8.8.8.8 - 8.8.4.4)

I've tried that and all the sites that show if a website is down show it as down as well. Notice that nobody here uses it anymore... sure you're not just loading a cached version of the site?
 
I've tried that and all the sites that show if a website is down show it as down as well. Notice that nobody here uses it anymore... sure you're not just loading a cached version of the site?
Workin' fine here. Not cached, guaranteed. Yes, it is occasionally flaky and sometimes it breaks, but again, this isn't surprising to anyone who's used the site for more than a month.
 
Back