Archival Tools - How to archive anything.

Browsertrix



is an open-source, automated web crawler designed for high-fidelity archiving of websites, including those with dynamic and JavaScript-heavy content. You can run it locally in a Docker container, making it a cost-effective alternative to paid archiving services. It's perfect for those who want a "set-it-and-forget-it" approach to capturing and preserving web pages over time.
 
Is there a way to archive and save Facebook videos that are flagged as Violent/Graphic content? yt-dlp errors out when you try to do it, which is because those videos cannot be viewed if you aren't logged in. Ghostarchive saves the video page with the error that you get if you try to view it without logging in, so that doesn't work either. And Cobalt doesn't seem to be able to pull those videos either.
 
Last edited:
Is there a way to archive and save Facebook videos that are flagged as Violent/Graphic content? yt-dlp errors out when you try to do it, which is because those videos cannot be viewed if you aren't logged in. Ghostarchive saves the video page with the error that you get if you try to view it without logging in, so that doesn't work either. And Cobalt doesn't seem to be able to pull those videos either.
Use one of the online private facebook downloaders. It has you put the link in then sends you a link to copy into a browser select all then pop it in their giant field for submitting the info. It ends up finding the video from your cache bin or something. Used to have to use it all the time in Hollie's thread.
 
Set up an ArchiveBox (self hosted webpage archiver) in my home server today when I was re-doing it. Incredibly helpful and is working well. There are lot of sites (personal sites, blogs of friends, random cool webpages, etc) I don't want to put on community archival projects but want to preserve. I recommend it!
 
Does anyone have any good ways to archive Gumroad content (aka 2cool4YouTube content)? - I tried kemono party (that's used to bypass a host for a whole lotta place, and used mostly by coomers) but their gumroad scrapper is broken.

Otherwise an alternative would be to manually (or using a script) download all the content, and maybe publish a legal torrent someplace else, legally ofc!
 
Would anyone be able to help me with this problem I have with PreserveTube? I keep getting this message:

PreserveTube.png

Is there a size/length limit? As the videos I've tried to archive are multi hour videos/live streams, but I have been able to archive channels.
 
Is there a size/length limit? As the videos I've tried to archive are multi hour videos/live streams, but I have been able to archive channels.
IIRC, PreserveTube won't archive videos that are longer than 90 minutes.
edit: Not positive on the exact time cutoff or if file size factors in as well, but there's definitely a limit.
 
Last edited:
It's actually pretty inconsistent whether you get that error or not. Sometimes I get it randomly on shorter videos that go through fine on a second attempt. Sometimes a video that seems like it's too long will go through after multiple attempts. I wish the guy would have the thing check the video length/size before trying to download, rather than erroring out partway through. Would probably save the server some work too.
 
For some reason, archive.today doesn't seem to archive xcancel well since a while.
1000045528.jpg
Which itself wouldn't be a problem if Poast didn't show me this everytime.
1000045530.jpg
Which also wouldn't have been a problem if it at least just archived.
1000045532.jpg
Does anyone know any alternatives? Also, before anyone says just use Twitter itself, it shows me 404 Not Found after trying to archive it twice.
 
Last edited:
For some reason, archive.today doesn't seem to archive xcancel well since a while.
The xcancel instance isn't optimal for archiving. Instead, consider using the poast instance or nitter.privacydev.net.

Here are some general tips for effectively archiving tweets:
  1. Avoid Visiting Archiving Sites Before Submission
    • Do not visit the Poast or PrivacyDev instance directly when planning to archive a tweet. Instead, copy the tweet link and paste it directly into archive.today to preserve the tweet without unnecessary site visits.
  2. Thread Archiving Best Practices
    • For threads, always archive the first and last tweets in the thread. Use reliable tools like GhostArchive or a Nitter instance for this purpose.
  3. Accidental Visits to Nitter
    • If you accidentally visit a Nitter instance before archiving, use a VPN to anonymize subsequent actions and avoid any potential issues.
  4. Alternative Methods for Threads and Replies
    • If Nitter and GhostArchive are unavailable and you need to archive a thread or its replies, use archive.today. When pasting the link of the tweet, replace the username with an “i” (e.g., change username to i) to bypass potential restrictions.

Example:
Screenshot 2024-12-27 164713.png
 
Last edited:
Is Archive.today screwy for everyone the past 24 hours or so? I tried it in a fresh session on a VPN and still got a server error. But some pages do go through since I see archives from 2 hours ago, etc.

I use Ghostarchive, Megalodon, and Wayback when needed, but Archive.today is typically the best.
I've only been able to access it via Tor for the past two days.
 
Back