- Joined
- Jul 23, 2023
archive.today (aka archive.is) is a web archiving website that saves snapshots on demand. It's one of the few websites that saves a wide range of popular websites that are Javascript heavy (such as Google Maps, X & more). Once a web page is archived, it cannot be deleted directly by any Internet user.

It was created on May 16th, 2012 as a similar service to Megalodon (a Japanese archival service created in 2005). The owner of the service goes into detail on why he created it here (archive), stating that it came from the Internet Archive (archive.org) as a seperate service, and created it with the intended goal in mind saying "we won't delete what they delete, and vice versa, even when politics isn't involved", serving as an apolitical archive service:
The website is closed source due to various hardcoded problems, meaning it is unlikely it will ever be open source. As stated by the owner:
To avoid detection/blocks, archive.today runs via a botnet that cycles through countless IP addresses (using VPS/shithosts), making it quite difficult for webmasters to stop their sites getting archived. For regular websites, they sometimes ask for various ways to get around it. They don't use any residential proxies as "that pays off for ad-traders, but not for a screenshot service".

(link / archive)
archive.today currently has over a petabyte (1000TB) of data with over 500 million webpages archived. So that's archive.today: It has over a petabyte of data and this thread serves as a general discussion thread for anything related to this site.

It was created on May 16th, 2012 as a similar service to Megalodon (a Japanese archival service created in 2005). The owner of the service goes into detail on why he created it here (archive), stating that it came from the Internet Archive (archive.org) as a seperate service, and created it with the intended goal in mind saying "we won't delete what they delete, and vice versa, even when politics isn't involved", serving as an apolitical archive service:
They archive webpages relatively easily, because archive.today does not obey robots.txt because it acts "as a direct agent of the human user" as stated in their FAQ page. Archive.today launches real browsers (not even headless) and tries to load lazy images, unroll folded content, login into accounts if prompted with login form, remove “subscribe our maillist” popups, etc (source / archive). The site uses Chromium/80 with a few small patches after 2019, before they were using PhantomJS (source / archive). This also allows for archive.today to archive Tor webpages as well as specific IP addresses.We created a service similar to Megalodon, which was already quite popular in Japan. First, we had to choose a domain zone. Not the USA or the EU (the current horrors hadn't happened yet, but SOPA and PIPA was already being planned), and not the Caribbean, where a registrar's server could crash and take months to recover. Libya (.ly) was fashionable at the time, but Gaddafi had just been killed. So Iceland seemed interesting: there were bearded sysadmins in parliament, they created Mailpile. Then we looked at which single-word domains were available.
When we started, archive.org didn't have a "Save Now" function, so our features didn't overlap at all. Even our names are different, just homonyms: archive.org is a noun, while we are a verb: "archive.is/today" was intended as an imperative, like "Save Now!"
Then two things happened. First, archive.org introduced its "Save Now" feature. Second, when we finally started communicating—around 2020—Mark mentioned that they come from a background of left-wing activism (this isn't a secret; their biographies are public; I just hadn't looked into them until it was brought to my attention).
By that time, Gamergate and various other scandals had already occurred. With few small exceptions, the right tended to preserve pages, while the left wanted to delete them. That was my aha moment: no collaborations were possible here. And so we became a kind of dialectical pair: we won't delete what they delete, and vice versa, even when politics isn't involved.
This is what's driving us in this direction, toward the role of a smaller archive.org. Whether that's good or bad, I don't know yet.
The website is closed source due to various hardcoded problems, meaning it is unlikely it will ever be open source. As stated by the owner:
(link / archive)Unlikely. It has too many hardcoded things specific to my installation. From the type of hardware (like ”that server is too old that it requires kernel-4.4 with a specific patch”) to using a quite exotic operating system.
There is plenty of open-source software in this area: https://github.com/iipc/awesome-web-archiving
To avoid detection/blocks, archive.today runs via a botnet that cycles through countless IP addresses (using VPS/shithosts), making it quite difficult for webmasters to stop their sites getting archived. For regular websites, they sometimes ask for various ways to get around it. They don't use any residential proxies as "that pays off for ad-traders, but not for a screenshot service".

(link / archive)
archive.today currently has over a petabyte (1000TB) of data with over 500 million webpages archived. So that's archive.today: It has over a petabyte of data and this thread serves as a general discussion thread for anything related to this site.
In 2012, the site already had 10 TB of archives and cost ~300 euros/mo to run, escalating to 2000 euros by 2014 and $4000 by 2016. As of 2021, they have archived on the order of 500 million pages, and with the average size of a webpage clocking in at well over 2 MB these days, that’s a cool 1,000 TB to deal with. (For comparison, the Internet Archive is around 40,000 TB.)
archive.today domains:
- archive.today
- archive.is
- archive.fo
- archive.li
- archive.vn
- archive.ph
- archive.md
- archive.ec: Lost due to service interruption for 9+ days, domain was resold, not in control of archive.today anymore.
It is intentional.
No single domain is reliable and I have no means to enforce control on each domain.
* archive.today - threatened with confiscation http://blog.archive.today/post/116913927371/the-domain-registrar-gransy-s-r-o-aka, also a troll attack caused service interruption https://blog.archive.today/post/138982909006/domain-problems-again
* archive.is - threatened with confiscation https://twitter.com/archiveis/status/1081276424781287427, asked not to use “archive.IS” for branding (that’s why you see “archive.TODAY” in the top-left corner; although many people remembered it as “archive.IS” and refer it so)
* archive.fo - threatened with confiscation https://twitter.com/archiveis/status/1188222460598116353
* archive.li - attacked by trolls impersonating police, caused few days service interruption https://twitter.com/archiveis/status/956025540028268547
* archive.ec - attacked by trolls causing service interruption and finally lost https://twitter.com/archiveis/status/1093608363647291393
* archive.vn - ok so far
* archive.ph - ok so far
* archive.md - ok so far
* a nice domain unrelated to archive - one day whois started showing someone’s else information and the registrar did not response, the domain was lost
archive.today socials:
- Archive.today Tumblr (includes major list of technical questions/related downtime/etc/general help
- Blog
- Liberapay
- Email (webmaster@archive.today)