Archival Tools - How to archive anything.

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Is YouTube-DLG still working for people? I tried saving a video to my computer but the program encountered an error.
It hasn't been updated in a long time and is probably completely broken if it still depends on youtube-dl.

Most people use yt-dlp now. Not sure if it has a GUI.
 
There's Tartube, a gui for both yt-dl and yt-dlp but last time I tried there was an issue involving ffmpeg, not sure if they fixed it.

Edit: just tested on Linux (Kubuntu) and its working fine, Windows is the problem.
 
Last edited:
There's Tartube, a gui for both yt-dl and yt-dlp but last time I tried there was an issue involving ffmpeg, not sure if they fixed it.

Edit: just tested on Linux (Kubuntu) and its working fine, Windows is the problem.
No issues on Arch either. Only complaint I have is regarding selecting live chat as a subtitle language. It works and can be filtered as text in firefox and followed well enough. Generally there are errors loading as .json does not contain line breaks. Errors can be cleaned up here


Then loaded in the browser, filtered as text, and printed to .pdf looks like this.

Does anyone have a better process? I have had no luck converting to subtitle formats read-able by VLC or MPEV.
 
  • Like
Reactions: Dork Of Ages
I wanted to ask, is there an effective and great way to archive (public or private) Discord Channel/DMs (while also having the archive itself be verifiable/genuine)?
 
  • Like
Reactions: Dork Of Ages
Sounds suspicious...
The problem I'm trying to overcome is that when it comes to posts on websites, you can easily archive by using wayback machine or archive.is but nowadays everyone from ecelebs, communities, developers to your pet goldfish (there's even forums that are switching) are using Discord (for some odd reason). I guess screenshots and chat logs seem to be the only way but both can be faked (in servers where they delete messages or are semi-privated) and it can be annoying to try and log them.
 
  • Like
Reactions: Dork Of Ages
The problem I'm trying to overcome is that when it comes to posts on websites, you can easily archive by using wayback machine or archive.md but nowadays everyone from ecelebs, communities, developers to your pet goldfish (there's even forums that are switching) are using Discord (for some odd reason). I guess screenshots and chat logs seem to be the only way but both can be faked (in servers where they delete messages or are semi-privated) and it can be annoying to try and log them.
Which Discord server are you trying to scrape?
 
  • Like
Reactions: Dork Of Ages
There isn't a Discord Server that I'm planning on scaping (for now). It's more just if I were planning to archive, if there's any useful tools for discord (and any similar platforms).
I don't know any tools to scrape Discord, unfortunately. From all the Discord "leaks" I've read about it, the logs usually come from a server admin who can export them.
 
I wanted to ask, is there an effective and great way to archive (public or private) Discord Channel/DMs (while also having the archive itself be verifiable/genuine)?
Yes, and it's already in this very thread:
https://github.com/Tyrrrz/DiscordChatExporter

I've personally used previous versions of this, not the current one. It can export from servers your account is in and its PMs, also allows to retrieve those within a selected range of time.

If the current version works like the one I've used, once you export the messages, it will create an HTML file; in order to save all media within, open the HTML file in your browser and then "save page as", which will create another HTML file with a folder containing all media (avatars, emojis, images, etc). That file, along with the media folder is what you can then zip and send people for them to see.



What you're asking about it being verifiable/genuine is a bigger task, because all you're doing in the end is creating HTML code that can be altered, nothing prevents you from doing so. And that's true whether it's images or video, unless the source is verified directly (like you inviting others to the server to see it) then all can be manipulated.

The reason why Wayback Machine and archive.ph (for example) work fine is because they're somewhat reputable sources, nobody in principle is going to doctor it, or at least it's assumed, so unless a Wayback Machine type of website starts scraping servers like this and be reputable enough to be trusted, then the integrity of the data is at risk.
 
I tried digging thru here but wasn't able to find this. Hopefully I didn't just miss it.

A top level site, with potentially thousands of .txt files that are displayed as webpages. Is there a way to use Archive.ph or wayback to grab the links from the page and archive them as well?
 
I tried digging thru here but wasn't able to find this. Hopefully I didn't just miss it.

A top level site, with potentially thousands of .txt files that are displayed as webpages. Is there a way to use Archive.ph or wayback to grab the links from the page and archive them as well?
Wayback Machine has an option to "Save outlinks" but its only accessible if you have an account. I don't know about archive.ph
 
A top level site, with potentially thousands of .txt files that are displayed as webpages. Is there a way to use Archive.ph or wayback to grab the links from the page and archive them as well?
Can you post the site or would you rather not? You can try using this script. The -o option and the examples of it here should be a start.

edit:
You can also poke around and see if their sitemap.xml file lists the pages and rip a list from that, then use the above script to archive the urls. That's been my go-to lately when I'm archiving sites.
 
Last edited:
archive.md/is/ph is seemingly blocking Tor exit nodes and certain VPN IPs, or it's just been down entirely for the past 2 or 3 days.
Of course there's also archive.org, but are there any alternative website archivers?
 
  • Informative
Reactions: TheSkoomer
Honestly, maybe you don't even need to try a different company, you can also try using IPs from servers that are very likely to not be used that much like small European countries, asian and african countries except for the big ones like Japan and South Africa.
 
Try a different VPN company, I haven't had any issues this week. Also make sure you're not using Cloudflare DNS. There is a TOR address available here:
http://archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion/
Honestly, maybe you don't even need to try a different company, you can also try using IPs from servers that are very likely to not be used that much like small European countries, asian and african countries except for the big ones like Japan and South Africa.
I've tried a mix of third and first world countries and none seem to work. It's resolving at least, so I don't think it's a DNS issue.
Onion site works fine, at least. Thanks.
 
  • Like
Reactions: notafederalagent
Back