Archival Tools - How to archive anything.

Check this thread:
You can create a really good reddit archive by changing the r in the reddit URL to c.

Yeah. I know ceddit exists but I swear there was a website that would allow you to capture how things look from your view of the page. Like how you can access private Twitter accounts if you're a follower but not if you are just archiving the base page but with the ability to actually archive it and not just take a bunch of screenshots.

Edit: But thanks for the thread. Full page screen capture might work as a substitute for what I was looking for.

Edit 2: Some of the other stuff listed on that page are perfect for what I need them for too.
 
Last edited:
@TrappityDippity you are in search of https://webrecorder.io, the best of the web archiving sites.

Make sure that when you first log in to a Twitter account from there, do it in a separate 'recording session' that you then delete, so the main one where you capture all the private messages doesn't include parts of your email address or phone number from the two-factor confirmation stage. If you want to be super sneaky and avoid the person blocking you if they find out about you recording the private tweets/messages, change your @ and other identifying details before logging in and change them back after.
 
I didn't write it and/or find it. SoapQueen1 did.
It has been working great!
(If SoapQueen1 would like to do the same for other tweeting troons like Rhys, Steven or people like Harrop, that would be awesome)

I have been trying to remember to archive pages like "https://twitter.com/trustednerd/with_replies" every now and then as well
I'm concerned that if I archive too many Twitter users from the same IP address Twitter will start blocking me. They have a strict no-automation rule. I don't want to spend my own money spinning up virtual private servers to do it. People can either do it themselves and I'll try to help for free, give me access to servers and I'll do it myself for free, or pay me enough to spin up some $3.55/month servers and put 3-5 users on each. Here's the shell script I run as a cron job once every 5 minutes. It can be improved and is dedicated to the public domain where applicable and where public domain dedications aren't respected I grant you a license to do whatever you want with it. It requires the installation of three or four tools depending on what comes stock with your distribution of choice: curl, xvfb-run, cutycapt, and archiveis.
 
Last edited:
I'm concerned that if I archive too many Twitter users from the same IP address Twitter will start blocking me. They have a strict no-automation rule. I don't want to spend my own money spinning up virtual private servers to do it. People can either do it themselves and I'll try to help for free, give me access to servers and I'll do it myself for free, or pay me enough to spin up some $3.55/month servers and put 3-5 users on each. Here's the shell script I run as a cron job once every 5 minutes. It can be improved and is dedicated to the public domain where applicable and where public domain dedications aren't respected I grant you a license to do whatever you want with it. It requires the installation of three or four tools depending on what comes stock with your distribution of choice: curl, xvfb-run, cutycapt, and archiveis.
Bro. This is fucking sweet. Just set it up on my cockbox, took maybe ten minutes. For anyone interested:

* Set up a new user, tweetcrawler. As that user, create directories down to the level of '/home/tweetcrawler/tools/twitter/tweetcrawler/html/'.
* For Debian/Ubuntu, xvfb-run is packaged as part of the 'xvfb' package, so just 'sudo apt install curl cutycapt xvfb'.
* If archiveis isn't packaged via your system's package manage, just install it under the tweetcrawler user account with 'pip install --user archiveis'. Change the script to point at '/home/tweetcrawler/.local/bin/archiveis' instead of '/usr/local/bin/archiveis'.
* Populate usernames line by line in a text file at path '/home/tweetcrawler/tools/twitter/tweetcrawler/users'

@SoapQueen1 have you considered using Tor to make the requests to reduce your footprint? Running the daemon in the background is pretty lightweight. You could probably just set the appropriate environment variables and curl would pick it up, even if the archive python script doesn't. Alternatively 'torsocks' works great for wrapping most things in Tor, I use it for pulling in mail from all my email accounts.
 
@SoapQueen1 have you considered using Tor to make the requests to reduce your footprint? Running the daemon in the background is pretty lightweight. You could probably just set the appropriate environment variables and curl would pick it up, even if the archive python script doesn't. Alternatively 'torsocks' works great for wrapping most things in Tor, I use it for pulling in mail from all my email accounts.
I think Twitter might end up blocking Tor but I'm not sure.
 
I think Twitter might end up blocking Tor but I'm not sure.
You'd think so, but I've never had any trouble using Twitter/Tweetdeck through Tor (no more than a regular VPN, anyways)- I've also never really gotten rate limited pulling down data from Twitter using Twint, despite pulling hundreds of megs of tweets over a few hours or so at times. I think with 900 odd Tor gateways it should be pretty safe.

I've been running a modified version of your script for a few days now- using archivenow instead of archiveis to push into Wayback as well (it also supports a couple other archive sites and local WARC archives too if you're interested), wrapping the requests in torsocks (with the IsolatePID option turned on, so separate Tor circuits for every single request via tor/archivenow/the snapshot tool), and changing to a different snapshot tool (mainly in the hopes that it would let me run JS to show 'sensitive' images automatically- couldn't get the JS running consistently for that unfortunately).

It's not quite as fast what with spinning up a new circuit each time, but it works fine keeping on top of 10 or so reasonably active users.
 
Has anyone else noticed that downloading youtube videos seem to be completely hit or miss over the last two weeks? Tried downloading a cow's video last night before they deleted and none of the usual tools, addons, or websites worked.
 
  • Agree
Reactions: wabbits
Has anyone else noticed that downloading youtube videos seem to be completely hit or miss over the last two weeks? Tried downloading a cow's video last night before they deleted and none of the usual tools, addons, or websites worked.

update youtube-dl, google is constantly doing shit to make youtube-dl not work, and the devs keep unfucking it.

If your using linux, don't use the one in your distros repositories. Use pip to install it, it'll grab the newest version.

edit- just checked - yeah youtube-dl from pip got updated 6/21/19, which is in line with whatever fuckery you were experiencing.

they seem to fuck with/break music more often than just general videos (copyright claims get priority i guess)

edit - again.
Termux is handy as fuck on android. Basically a little sandboxed linux install/shell that you can apt-get stuff, install pip and get youtube-dl rigged up on your phone for archiving/grabbing audio to listen to when driving without wasting the battery of having the screen being on or the data usage for video.

the widget for termux is a paid feature, but it allows you to make bash scripts that have an icon (or a list of scripts you can just tap on) that run like an app (i've got one where i copy the url to a youtube video, the script runs grabs the url from the clipboard, downloads audio only) the api for grabbing clipboard and exposing some of the android stuff (camera,sensors,etc) i think is free though.
 
Last edited:
Is there a way to download individual videos from a youtube playlist? I'm trying to download certain videos from this one:
using youtube-dlg but it seems to be downloading the whole 21 videos. (I've tried looking for them individually outside of the playlist but can only find edits)
 
  • Feels
Reactions: anameisaname
Is there a way to download individual videos from a youtube playlist? I'm trying to download certain videos from this one:
using youtube-dlg but it seems to be downloading the whole 21 videos. (I've tried looking for them individually outside of the playlist but can only find edits)
You'll want to take the part of the URL after the vid id ('7V1IycxxIx8') off.
 
Is there any faster way to archive YouTube videos? Downloading takes fairly long as it is but uploading to AnonFile takes even longer.
Yes I use shitty internet and cannot afford better because XFINITY monopoly and debt to them.
I can still do it, it just gets tedious the longer the videos get.
 
Twitter Canada is releasing a feature that will allow the user to hide replies. Is there an archival tool that will bypass this or we just need to be really fast to archive Twitter stuff now?

EDIT: "We’re testing a feature to hide replies from conversations. This experience will be available for everyone around the world, but at this time, only people in Canada can hide replies to their Tweets. "
 

Attachments

  • wut.jpg
    wut.jpg
    16.7 KB · Views: 258
  • lol.png
    lol.png
    21.7 KB · Views: 219
Just when referred from here, or at all? It works both ways for me as long as I switch off of Cloudflare DNS.
When refered from here. I'm using moblie whlie my computer is down so I'll keep that in mind when it works a again.
 
  • Like
Reactions: totse
When refered from here. I'm using moblie whlie my computer is down so I'll keep that in mind when it works a again.

I found "DnsChanger" for switching DNS on mobile. It makes a local VPN to route the DNS request which I guess means the DNS server can't be changed directly. On wifi it can be changed directly in the advanced settings of the network, of course
 
Cross-posting. Here's a quick tutorial on how to archive Youtube comments if you're ever dealing with a censor-happy Youtuber or a company trying to spin a narrative by selectively deleting comments.
Click the comment's time stamp.
View attachment 873589
This will generate a URL that highlights the comment and puts it at the top of the comment list (not seen here as the comment was already at the top).
View attachment 873590
Put the URL into your archival site of choice.
View attachment 873591
If the comment gets deleted, it won't show up when you try the highlighted comment URL.
 
Any ideas on the problem here?

When trying to upload videos grabbed from reddit to kiwi farms, I get this error message:

halp.jpg


The video is only 6 seconds so I know it's not too large.
 
  • Informative
Reactions: anameisaname
Back