Archival Tools - How to archive anything.

The archive.today domain hasn't been working for me, maybe for a day.

Other domains work. But if you're on archive.ph for example, and middle click archive.today at the top of the page to open up another tab quickly, it gives an SSL error, instead of instantly redirecting like it used to.

There have been likely unrelated issues getting the site to work at all, but those are intermittent and it's working fine right now.

Edit: It's working now.
 
Last edited:
pardon my question if it hadn't already been listed, how difficult would it be to get a page properly archived as a simple .html file? could it be possible to cram an entire website as a single .html file? a lot of these online archives appear to be politically motivated, and the ones that don't tend to go down quite frequently...
 
archive.ph and archive.today, have not been working for me recently. Are the sites down for anyone else?

Also is there a archiving tool for facebook pages/accounts?
Archive.ph and its spinoffs have also been giving me weird drama all morning, so you're not alone. It keeps redirecting me to a particular URL - http://archive.is/fz139/ - and this screencap comes up.
1774274658054.png
No clue what it means, but it's been happening intermittently.

Also, I would love an archival tool for Facebook (and for Instagram, too) as they always come back with errors whenever I attempt to archive them, and GhostArchive hardly behaves even on a good day.
 
It keeps redirecting me to a particular URL - http://archive.is/fz139/ - and this screencap comes up.
1774274658054.png
No clue what it means, but it's been happening intermittently.
Quick Google translation:
Screenshot_2026-03-23-11-25-19-460_com.android.chrome.png
Dear Subscriber!
Access to the Internet resource blocked by decision of government authorities
You can view the reason for blocking in the unified registry.
It links to an 'RKN blocklist', which times out on my browsers and archival websites. 'RKN' means Roskomnadzor, a Russian government censorship apparatus that has quarreled with Kiwi Farms a couple of times.

Edit: no mention yet [11:48g GMT -5] of this incident on the blog of the owner of Archive Today [ L | A ]
 
Last edited:
Archive.ph and its spinoffs have also been giving me weird drama all morning, so you're not alone. It keeps redirecting me to a particular URL - http://archive.is/fz139/ - and this screencap comes up.
View attachment 8743835
No clue what it means, but it's been happening intermittently.

Also, I would love an archival tool for Facebook (and for Instagram, too) as they always come back with errors whenever I attempt to archive them, and GhostArchive hardly behaves even on a good day.

Quick Google translation:
View attachment 8744378
Dear Subscriber!
Access to the Internet resource blocked by decision of government authorities
You can view the reason for blocking in the unified registry.
It links to an 'RKN blocklist', which times out on my browsers and archival websites. 'RKN' means Roskomnadzor, a Russian government censorship apparatus that has quarreled with Kiwi Farms a couple of times.

Edit: no mention yet [11:48g GMT -5] of this incident on the blog of the owner of Archive Today [ L | A ]
So far for me it doesn't bring me to a RKN webpage, it just times out and doesn't load. I heard the guy who runs archive.ph has had some drama, not too long ago.
 
pardon my question if it hadn't already been listed, how difficult would it be to get a page properly archived as a simple .html file? could it be possible to cram an entire website as a single .html file? a lot of these online archives appear to be politically motivated, and the ones that don't tend to go down quite frequently...
To archive multiple web pages into a single file, you need to use specialized web archiving tools that save content in formats like WARC or the newer WACZ. I recommend using the ArchiveWeb.page browser extension or its Electron app for simple, manual archiving. These tools are also useful for capturing content that might otherwise be difficult to preserve, such as pages behind paywalls or content from private social media accounts (as long as you have access to view them)

After you’ve created your archive, you can use ReplayWeb.page to open and browse the archived content locally, making it easy to revisit pages as they originally appeared.
 
Also, I would love an archival tool for Facebook (and for Instagram, too) as they always come back with errors whenever I attempt to archive them, and GhostArchive hardly behaves even on a good day.
There was a way to regularly archive Instagram, imginn.com, which is a 3rd party front end that can view profiles, individual posts and stories with no login requirement. Unfortunately, the strict Cloudflare captcha turnstile rule they have recently implemented breaks all automated archival services. The service also has a 50/50 chance of not finding the existing profile at all. This is probably because Meta is tightening the anti-scrape measures.

About Facebook, there are no such 3rd party front ends. If the profile is public, you can attempt to archive it normally. The only ones I had success with were mastodon.jp and archive.is, and they both managed to archive a singular post. It was also a verified page with 36k likes, so I assume the blocks were more lenient because of it.

To fix these issues, as @NGMI says 2 posts above, ArchiveWeb.page and its desktop application would be optimal for Facebook and Instagram archival, as it lets you not only choose the IP to archive with, giving your user agent a more "humane" look, but also bypass the hard-coded log-in barriers by letting you actually log in. There are some quirks: it's not exactly an automatic process, you may encounter de-anonymization issues from using it and it is less convenient to view and share.

The simplest setup is with a Chromium browser and the browser extension, but I prefer using the desktop application for better isolation. You can read more on how to archive a page in the documentation.

The main advantage of AWP is the fact that you can scrape as many pages and menus as you want, as long as you interact with them. If you'd like to scrape the "About" section of a Facebook profile, you can do that, if you interact with the "About" button on their profile. This goes for most things. If it is visible on the browser screen, it has been scraped.

The de-anonymization issues depend on the OPSEC you are willing to treat this program with: since you are essentially running a localized scraper, it downloads most if not all the scripts inside of the webpage. Since Meta loves its JS telemetry and trackers, it would be wise to run these archive processes in isolation, otherwise some metadata about the logged-in user account or region may get leaked through these files. If you run this in an isolated VM environment with a VPN and a one-off burner Facebook or Instagram account with a throwaway email, you will be fine.

It is also less easy to share on the forum: I have used this before to archive @Cats's original Squarespace website and I had to compress the file in a .zip file, otherwise it wouldn't let me upload it because the .wacz isn't present in the file format whitelist of the forum software. So, it requires the user to download, extract and open the .wacz file in ReplayWeb.page to view them.

Sites that use AWP as a framework for its archives are GhostArchive and Webarchive.

If you're a poweruser and you care more about archiving the actual images rather than the profile details, you can use gallery-dl with exported account cookies.
 
Last edited:
Are there any good resources for archiving a LinkedIn profile? Most tools I know of seem to struggle with it. I haven’t tried ArchiveWeb.page or exporting a profile as a WARC yet, so I’m not sure how it may work for this but I'm guessing the recording of a session would probably help right?

The only suggested workaround the last time this issue was mentioned ITT was taking a screenshot. I’m wondering if that’s still the case.​
 
Are there any good resources for archiving a LinkedIn profile?
ArchiveWeb.page works, but also full page screenshots work just as fine (available by default on Firefox), if we're talking Jobs, Skills, etc., since they're shorter pages. When it comes to Posts, however, AWP is the better option.

LinkedIn works exactly how Facebook does, since on some profiles it automatically forces you to log in to view them.
 
There was a way to regularly archive Instagram, imginn.com, which is a 3rd party front end that can view profiles, individual posts and stories with no login requirement. Unfortunately, the strict Cloudflare captcha turnstile rule they have recently implemented breaks all automated archival services. The service also has a 50/50 chance of not finding the existing profile at all. This is probably because Meta is tightening the anti-scrape measures.
Just used imginn,com this morning and it worked well enough for me to snag a couple of insta videos off a cow I am currently working on
 
An update to compressing videos, using the better codec for AV1
  • With FFMPEG,
Code:
ffmpeg -i input.mp4 -c:v libsvtav1 -b:v 500K -c:a libopus -b:a 64K -r 30 output.webm

If the video has a lot of movement (video game as background or otherwise) increase video bitrate to 1M,
Code:
-b:v 1M
(or in very very rate instance to 2M).

If the video has a lot going on in terms of audio (like background music or lots of people talking/background noise) increase the audio bitrate to 128K,
Code:
-b:a 128K

  • With KDENLIVE
Create a new preset, Press Ctrl + Enter and select Create New Preset
1.webp

Video Tab
Select a group of your choice,
Name the preset as you wish,
Change the Codec to libsvtav1,
Rate Control to average bitrate,
Bitrate to 430k ( it renders close to 500K)
2.webp

Audio Tab
Codec to libopus
Rate Control to Average Bitrate
Bitrate to 64K
3.webp

If the video you're trying to archive has a lot of movement (like a gaming background or otherwise) you can change the Bitrate under Video tab to 1000K or between 1000K and 2000K
If the video has background music, or noise, change the Bitrate under Audio tab to 128K.

You can keep a separate preset for higher video/audio bitrate.
Now, just render the file using this preset.
 
Anyways, here's a bunch of replacements for pomf.lain.la:
https://catbox.moe/ (no file expiration date, max 200mb)
https://qu.ax/ (no file expiration date option available, max 256mb)
https://fileditch.com/ (no file expiration date, does prune >1 year files on occasion, max 15gb)
A bit of an update on some of these:
https://catbox.moe/ (No file expiration date, max 200 MB) FAQ
https://new.fileditch.com/(URL change, no file expiration, occasionally prunes >1 year files oldest first, max 25 GB) FAQ
https://qu.ax/ (Multiple options of file expiration now available, max 256 MB) FAQ
multiple options of file expiration now available
supported formats.png

Some extra links for temporary file sharing:
https://litterbox.catbox.moe/ (Temporary uploads up to 1 GB are allowed. Expire options: 1h, 12h, 1 Day = 24h, 3 Days = 72h.) FAQ
https://uguu.se/ (Max upload size is 134 MB & files expire after 3h.) FAQ
 
Does anyone know what happens to links after they are blacklisted from the Wayback Machine? Does anyone have any clue why 8ch.net was blacklisted recently?
I was reading through this post and followed the link: https://web.archive.org/web/20150527215816/https://8ch.net/cow/res/108467.html only to discover that the URL has been excluded from their archives. I was curious and wanted to see if 8chan.co was blocked as well, to my surprise it wasn't.
Considering that this link was on the Wayback Machine for around a decade I find it puzzling that the 8ch.net domain has been blacklisted now, especially as 8chan.co is unaffected - but what really concerns me is that this could mean large swathes of lolcow history has been wiped off the face off the earth as /cow/ was hugely influential in the early days of the farms.
 
Does anyone know what happens to links after they are blacklisted from the Wayback Machine? Does anyone have any clue why 8ch.net was blacklisted recently?
I was reading through this post and followed the link: https://web.archive.org/web/20150527215816/https://8ch.net/cow/res/108467.html only to discover that the URL has been excluded from their archives. I was curious and wanted to see if 8chan.co was blocked as well, to my surprise it wasn't.
Considering that this link was on the Wayback Machine for around a decade I find it puzzling that the 8ch.net domain has been blacklisted now, especially as 8chan.co is unaffected - but what really concerns me is that this could mean large swathes of lolcow history has been wiped off the face off the earth as /cow/ was hugely influential in the early days of the farms.
You can’t access them through the Wayback Machine or its CDX API. However, you might still find them by searching specific collections on archive.org or within datasets like the Common Crawler.
1777690247405.png
According to the Archive Team wiki’s list of sites excluded from the Wayback Machine, it appears this site was only recently added to that exclusion list, so its probably Faggot Watkins that told them to exclude it
 
Back
Top Bottom