Archival Tools - How to archive anything.

Sep 16, 2025

john5433 said:
A question I have, how does something like archive.today and wayback get past DDOS captchas like cloudflare or kiwiflare. But ghost archive and megalodon have a harder time? What is the trick Wayback and archive.today get around this? Does anyone know exactly?

Wayback has always had some advantages. I have no doubt that they have plenty of IP space. And everyone loves them. I wouldn't be surprised if Cloudflare, let alone individual site owners, just whitelists Wayback crawlers by default (Cloudflare even has a direct partnership with the Internet Archive/Wayback). I'm sure Megalodon, like WebCitation back in the day, would be in the same position, if anyone outside Japan knew who the fuck they were.

The heroes behind Archive.Today will have to work harder. I mean, I don't like to stereotype Russians, but surely there's some equivalent of six degrees of Kevin Bacon for all Russians being three degrees connected to someone running a residential proxy network with thousands of IP addresses around the world. The way that they almost always bypass paywalls in the same way that Google's crawler identifying itself as Googlebot does? Well, one could speculate as to what useragent they use... it probably isn't 'ARCHIVE.IS ARCHIVE CRAWLER'.

Hellwalker · Sep 17, 2025

@clipartfan92 I don't know how much control you have over it but can you make it so that the Ghostarchive bookmarklet you made back in February in this post opens up to the captcha that sometimes appears when archiving with Ghostarchive? I ask because one of the biggest problems I face with that bookmarklet is the fact that any attempt at archiving with it just gets stuck in a loop. I've compared archiving the site from the bookmarklet to entering the link manually through the site itself and things that took literal seconds to archive through the site just got stuck in limbo with the bookmarklet. The main reason I suspect for this behavior to be happening is because of the captcha itself. I really don't think there's any other reason for it not working other than that.

The Archive.is bookmarklet already opens up to its Cloudflare captcha and it's not something that gets in the way of archiving too much to really complain about so I wouldn't mind having to go through Ghostarchives captcha. The only gripe I have with it is that one register of the captcha doesn't apply to new archives, meaning you have to go through it again and again but this happens sparingly.

john5433 · Sep 18, 2025

Hellwalker said:
@clipartfan92 I don't know how much control you have over it but can you make it so that the Ghostarchive bookmarklet you made back in February in this post opens up to the captcha that sometimes appears when archiving with Ghostarchive? I ask because one of the biggest problems I face with that bookmarklet is the fact that any attempt at archiving with it just gets stuck in a loop. I've compared archiving the site from the bookmarklet to entering the link manually through the site itself and things that took literal seconds to archive through the site just got stuck in limbo with the bookmarklet. The main reason I suspect for this behavior to be happening is because of the captcha itself. I really don't think there's any other reason for it not working other than that.

The Archive.is bookmarklet already opens up to its Cloudflare captcha and it's not something that gets in the way of archiving too much to really complain about so I wouldn't mind having to go through Ghostarchives captcha. The only gripe I have with it is that one register of the captcha doesn't apply to new archives, meaning you have to go through it again and again but this happens sparingly.

archive.today is cloudflared? I always gets slapped with a Google reCAPTCHA when using a proxy. Another question do anyone know if those scripts to archive multiple links in archive.today still work?

clipartfan92 · Sep 18, 2025

Hellwalker said:
I ask because one of the biggest problems I face with that bookmarklet is the fact that any attempt at archiving with it just gets stuck in a loop.

I think that happens because of the jankiness of the ghostarchive site. It may seem like they archive faster using the webpage but I think it's all luck of the draw with ghostarchive's site load, connectivity, and the remote site that's being archived. I've been having those same problems for a while now and I'll either get a loop or a familiar 502 Bad Gateway error page from Cloudflare. I just visited with Mullvad Browser and didn't get any captcha (using a VPN). I attempted to archive three urls. The first one got stuck in the loop. The second archived within 30 seconds with zero issues. The third got stuck in the loop and eventually went to a 502 error. If I refreshed the page, it would show it back in the loop again.

Pure speculation, but the captcha coming up might have something to do with the IP address/VPN endpoint you're using. I've got them in the past and other times I haven't.

I rarely, if ever, use the bookmarklet or actual webpage and mostly send the archive requests via curl and scripting. It doesn't make the process work any better but it saves me time. Here's a curl command to get the archive to start the process and display the link in a terminal (replace the $1 variable with the url):

Bash:

curl -sI https://ghostarchive.org/save/"$1" | grep -E '^location:' |\
awk '{gsub("location: ", "https://ghostarchive.org");print}'

Anyone got a more elegant solution to that grep/awk combo?

john5433 said:
archive.today is cloudflared? I always gets slapped with a Google reCAPTCHA when using a proxy. Another question do anyone know if those scripts to archive multiple links in archive.today still work?

I think they just mispoke on the cloudflare part. I've had no luck with those scripts in a couple of years and the constant cookie hassles weren't worth it when I looked into it. There may be a modern solution that uses selenium and one of those captcha solver plugins, but that's beyond my pay grade.

If I've got a list of urls for archive.today, I run them through a script that takes the text list of urls, prefaces them with the archive url, creates an html file, then opens that in a browser. Once you solve the first captcha, you can click a bunch of them before you get hit with another captcha. This code is atrocious and I haven't touched it for a few years, much less clean it up, but it works absolutely fine. Plz no bully, I'm just an idiot that plays on computers as a hobby and sar'd this up from bits and pieces from here and there. If anyone could rewrite this properly in bash or python, please do the needful, thank you sar!

Bash:

#!/bin/bash
#
# archive.today link script
#

cwd=$(pwd)
shortname=$(echo "$1" | sed 's/\.txt$//')

if [[ -z "$1" ]]; then
  echo "Error: No filename provided"
  exit
fi

if [[ ! -f "$1" ]] ; then
  echo "Error: File does not exist"
  exit
fi

if [[ ! -s "$1" ]] ; then
  echo "Error: File is empty"
  exit
fi

urls=$(while read -r line; do
  echo -n '  <DT><A HREF="https://archive.today/?run=1&url='
  echo ''"$line"'">'"$line"'</A>'
done <"$1")
touch "$shortname".html
cat >"$shortname".html <<CREATEHTML
<!DOCTYPE html>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<html>
<head>
<title>$shortname</title>
<style>
body {
    background-color: #000000;
    color: #ffffff;
}
</style>
<b>$shortname</b><br>
<br>
Archived urls:<br>
<dl><p>
$urls
</dl><p>
</head>
</html>
CREATEHTML

echo 'file://'"$cwd"'/'"$shortname"'.html'

brave-browser file://"$cwd"/"$shortname".html &>/dev/null &
#flatpak run com.brave.Browser file://"$cwd"/"$shortname".html 2>/dev/null &

If you want to archive a list of urls to archive.org, you can use this script (there's also a tor branch if that suits your needs). Depending on how much you're archiving, you're probably doing to hit some sort of daily IP submission limit, so using something like proxychains to rotate around VPN endpoints is probably needed.

That's way more than I wanted to type, hopefully someone gets some use out of it.

The Mass Shooter Ron Soye · Sep 28, 2025

I wonder when Archive.today will reach ~~916,132,832~~ 930,909,168 archives and start using 6 characters for the identifier.

Edward "Caesar" Sallow · Sep 29, 2025

Is there an alternative to yt-dlp? I'm a bit code illiterate, and trying to figure out how to get this to work is killing me. My original plan was to use a yt to mp4 converter, but I am going to assume using a omni-tool such as yt-dlp is the way to go for archival purposes.

Markass the Worst · Sep 29, 2025

Edward Caesar Sallow said:
Is there an alternative to yt-dlp? I'm a bit code illiterate, and trying to figure out how to get this to work is killing me. My original plan was to use a yt to mp4 converter, but I am going to assume using a omni-tool such as yt-dlp is the way to go for archival purposes.

There are GUIs and frontends. I haven't used them but you can try this.

Baraadmirer · Sep 29, 2025

Edward Caesar Sallow said:
Is there an alternative to yt-dlp? I'm a bit code illiterate, and trying to figure out how to get this to work is killing me. My original plan was to use a yt to mp4 converter, but I am going to assume using a omni-tool such as yt-dlp is the way to go for archival purposes.

I recommended this site to another user: SSVid.net. It's able to rip Twitter and YouTube videos (for the time being before they change their API again) along with a few other sites.

Osama Bin Laden · Sep 29, 2025

Edward Caesar Sallow said:
Is there an alternative to yt-dlp? I'm a bit code illiterate, and trying to figure out how to get this to work is killing me. My original plan was to use a yt to mp4 converter, but I am going to assume using a omni-tool such as yt-dlp is the way to go for archival purposes.

What is your OS? I could help depending on what it is.

Sep 29, 2025

Edward Caesar Sallow said:
Is there an alternative to yt-dlp? I'm a bit code illiterate, and trying to figure out how to get this to work is killing me. My original plan was to use a yt to mp4 converter, but I am going to assume using a omni-tool such as yt-dlp is the way to go for archival purposes.

JDownloader is going to be your best option.

Make sure to install it from here- their default Windows download option has some 'opt in' adware, what they call 'optional software offers'.

https://jdownloader.org/jdownloader2#selection=windows

The interface is... busy, but pretty much anything you can do with yt-dlp you can do with JDownloader.

The only real advantage that yt-dlp has is that it can automatically extract browser cookies for, say, YouTube, from your browser based on a command line flag (which you could add as a default to the yt-dlp config file if you liked). With JDownloader, I believe you have to use a browser extension to manually export these, and then import them to JDownloader. On the other hand a) you probably shouldn't be using cookies from your actual main browser/a Google account you actually use to archive from YouTube, in case they detect this and decide to lock you out of your account b) you should only have to do this once so who really cares.

The main actual reasons that I don't use it is that I like using busted old laptops with tiny amounts of RAM, and JDownloader will use more than yt-dlp (though it's not that bad really), and that I'm not aware of JDownloader having any good options to download SponsorBlock data or to integrate those and subtitles and chapters directly into a single MP4 video, at least without writing some batch/shell scripts (but that only really matters for my purpose, which is downloading videos to watch on my phone later, it doesn't matter for archive purposes- just download things like subtitles as seperate files).

Edward "Caesar" Sallow · Sep 29, 2025

Osama Bin Laden said:
What is your OS? I could help depending on what it is.

I run Windows 10 and downloadeded the first .exe for Windows I saw:

now it says Win8+, but I really don't know if that means Windows 8+ as an standalone OS, or Windows 8 and beyond. When I open up the executable, I that anything I type, it will shut down completely. I tried to copy-paste some of the third party package manager installs such as Scoop and Chocolatey, but those just shut down the executable as well.

DavidS877 · Sep 29, 2025

Edward Caesar Sallow said:
I run Windows 10 and downloadeded the first .exe for Windows I saw:
View attachment 7978049
now it says Win8+, but I really don't know if that means Windows 8+ as an standalone OS, or Windows 8 and beyond. When I open up the executable, I that anything I type, it will shut down completely. I tried to copy-paste some of the third party package manager installs such as Scoop and Chocolatey, but those just shut down the executable as well.

It sort of sounds like you're just trying to double click "yt-dlp.exe" What you need to do is open a command line(cmd or powershell) and run it that way. For instance in cmd and the yt-dlp is in Downloads.

Code:

Windows-R cmd (Displays Command Prompt)<enter>
C:\Users\dave> cd Downloads
C:\Users\dave\Downloads> yt-dlp https://youtube.com/some/video/goes here

As another user mentioned, you can also download a GUI for it.

Osama Bin Laden · Sep 29, 2025

Edward Caesar Sallow said:
I run Windows 10 and downloadeded the first .exe for Windows I saw:

Did you install ffmpeg? When it says Windows8+, it means Windows 8 and future versions. Download the .exe file, go the folder that the yt-dlp.exe is and open command prompt in the same directory that yt-dlp.exe is located

use the command yt-dlp.exe https://www.youtube.com/@Smosh

Edward "Caesar" Sallow · Sep 29, 2025

Osama Bin Laden said:
Did you install ffmpeg? When it says Windows8+, it means Windows 8 and future versions. Download the .exe file, go the folder that the yt-dlp.exe is and open command prompt in the same directory that yt-dlp.exe is located

use the command yt-dlp.exe https://www.youtube.com/@Smosh

I just installed the .tar.xz of ffmpeg from the yt-dlp github. The yt-dlp executable did not come in a folder, it was a standalone.

Osama Bin Laden · Sep 29, 2025

Edward Caesar Sallow said:
I just installed the .tar.xz of ffmpeg from the yt-dlp github. The yt-dlp executable did not come in a folder, it was a standalone.

Download https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp.exe and put it in a dedicated folder. go to cmd and run yt-dlp.exe in the folder you put the .exe in

Edward "Caesar" Sallow · Sep 29, 2025

Osama Bin Laden said:
Download https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp.exe and put it in a dedicated folder. go to cmd and run yt-dlp.exe in the folder you put the .exe in

View attachment 7978231
View attachment 7978233

I have tried to the best of my abilities to complete this in the way that you and DavidS877 had stated earlier. With the folder in Downloads, and trying to run it with the commands both of you gave me, I keep getting this error:

'ytp.exe' is not recognized as an internal or external command,
operable program or batch file.

Edit: My chimp brain got it to work, now do I throw in the FFmpeg into the folder with yt-dlp, or do I get that added a different way? As this was prompted as I got it running:

ffmpeg not found. The downloaded format may not be the best available. Installing ffmpeg is strongly recommended: https://github.com/yt-dlp/yt-dlp#dependencies

Osama Bin Laden · Sep 29, 2025

Edward Caesar Sallow said:
Downloads, and trying to run it with the commands both of you gave me, I keep getting this error:

it's case-sensitive, you're supposed to type

Code:

yt-dlp.exe

Edward "Caesar" Sallow · Sep 29, 2025

Osama Bin Laden said:
it's case-sensitive, you're supposed to type

Code:

yt-dlp.exe

I saw you replied as soon as I placed the edit in, how do I get ffmpeg to work as well? It gave the following during the download:

ffmpeg not found. The downloaded format may not be the best available. Installing ffmpeg is strongly recommended: https://github.com/yt-dlp/yt-dlp#dependencies

Osama Bin Laden · Sep 29, 2025

Edward Caesar Sallow said:
I saw you replied as soon as I placed the edit in, how do I get ffmpeg to work as well? It gave the following during the download:

Edward Caesar Sallow said:
Edit: My chimp brain got it to work, now do I throw in the FFmpeg into the folder with yt-dlp, or do I get that added a different way? As this was prompted as I got it running:

Wherever the .exe file is located, it downloads the file to that location.

john5433 · Oct 1, 2025

I've discovered some quite great, Yandex has saved web cache which you can rearchive on archive.today, this is good and all, but another great thing I found is they save the web cache of the wayback machine, meaning excluded domains like soyjak.party wayback snapshots can be viewed as they were archvied thanks to yandex. Example. This means sites that become excluded are actually not gone entirely, the only downside is that browsing these archives are just really inconvenient.

Archival Tools - How to archive anything.

⠠⠠⠅⠑⠋⠋⠁⠇⠎ ⠠⠠⠊⠎ ⠠⠠⠁ ⠠⠠⠋⠁⠛

WHO DARES BATTLE THE SARACEN

Hellwalker

Intolerance for life

john5433

clipartfan92

Award Winning

The Mass Shooter Ron Soye

Shit nobody cares about Expert ✅

Edward "Caesar" Sallow

The Mojave's Saltiest Retard

Markass the Worst

don't do stance, kids

Baraadmirer

💪🌸💪

Osama Bin Laden

Osama Bin Epler

⠠⠠⠅⠑⠋⠋⠁⠇⠎ ⠠⠠⠊⠎ ⠠⠠⠁ ⠠⠠⠋⠁⠛

WHO DARES BATTLE THE SARACEN

Edward "Caesar" Sallow

The Mojave's Saltiest Retard

DavidS877

2026, year of DOOM.

Osama Bin Laden

Osama Bin Epler

Edward "Caesar" Sallow

The Mojave's Saltiest Retard

Osama Bin Laden

Osama Bin Epler

Edward "Caesar" Sallow

The Mojave's Saltiest Retard

Osama Bin Laden

Osama Bin Epler

Edward "Caesar" Sallow

The Mojave's Saltiest Retard

Osama Bin Laden

Osama Bin Epler

john5433