Archival Tools - How to archive anything.

Markass the Worst · Feb 16, 2023

Toolbox said:
Did protonmail start requiring email/sms verification or is that only being tripped since I'm on TOR? Is there a decent alternative that will be accepted on most sites?

Protonmail tries to blacklist throwaway mail sites but you can just keep cycling through them until you can put in an email that it won't reject.

Toolbox · Feb 16, 2023

Markass the Worst said:
Protonmail tries to blacklist throwaway mail sites but you can just keep cycling through them until you can put in an email that it won't reject.

Know a good one that'll actually load on tor without trying to load google's crapcha?

Markass the Worst · Feb 16, 2023

Toolbox said:
Know a good one that'll actually load on tor without trying to load google's crapcha?

You can try GuerrilaMail.

Quote Cursor · Feb 19, 2023

When it comes to finding YouTube comments, it's a pain on the ass to use the standard frontpage GUI to search and especially cancer when your only sorting options are either "New" or "Top". I found an extension by the name of YouTube Comment Search (YCS) [Chrome or Firefox] that makes your life easier with more sorting options. The only problem I have found, is that you would have to use it carefully otherwise you can probably cause an API timeout and on videos with loads of comments, it would take a while to load them all. Otherwise, it helps in finding comments more effectively and you can click on them to get the comment link (useful for sharing or possibly archiving them). This extension also has some support for live chats too.

notafederalagent · Feb 19, 2023

Quote Cursor said:
When it comes to finding YouTube comments, it's a pain on the ass to use the standard frontpage GUI to search and especially cancer when your only sorting options are either "New" or "Top". I found an extension by the name of YouTube Comment Search (YCS) [Chrome or Firefox] that makes your life easier with more sorting options. The only problem I have found, is that you would have to use it carefully otherwise you can probably cause an API timeout and on videos with loads of comments, it would take a while to load them all. Otherwise, it helps in finding comments more effectively and you can click on them to get the comment link (useful for sharing or possibly archiving them). This extension also has some support for live chats too.

You can also get comments via yt-dlp, I used it a while back to do some research on commenters. Not sure if it gives comment links, but it was great for getting a list of channels to delve further into.

Wau · Mar 5, 2023

Figured that I'd throw http://www.vacbanned.com in here as well. It occasionally lists name changes that steamid.uk misses or has locked behind a paywall

As an aside, does anyone know of any websites which can list the past handles/usernames of a given Twitter account? I know https://botsentinel.com/ can, though its accuracy is dubious at best; and while memory.lol seems to be more thorough, it's locked behind its own access wall.
There should be some sort of tool that exists (which also apparently works for Instagram accounts as well), but I haven't been able to find anything of the sort which functions reliably.

.iota. · Mar 18, 2023

it's all been discussed, and then some, but this thread is getting longer, and i recently wrote this to help a fellow farmer, so i thought that i would share it here:

there are three different sites that i suggest bookmarking; ghost archive, archive.org, and archive.ph.

to archive a tweet, click on the tweet to render it in a screen with the date / time that it was posted, and scroll down to open all of the responses that are hidden behind a "more responses" link. highlight and ctrl+c the address bar. in a separate window, open an archive site. they each have a slightly different upload process.

- https://ghostarchive.org/

for this site, just ctrl+v the twitter address where it says "website to be archived" and click "submit for archival". it will take a few minutes to load. when the image renders, ctrl+c the address bar for the archive link.

- https://archive.org/ (aka "internet archive" or "wayback machine")

for this site, click the "web" tab next to the logo in the upper left corner.

ctrl+v the twitter address where it says "save page now" and click "save". this will bring up another page. click the white box in the center of the page that says "save page". it will take a few minutes, and the wayback machine excludes certain url's (anything from kiwifarms.net), so it may not upload. if it does, you'll get a confirmation screen with a green "done" message, and possibly a "first archive" message. below that will be a link to the actual archive. click the link, and when the image renders, ctrl+c the address bar for the archive link.

- https://archive.ph/

for this site, just ctrl+v the twitter address where it says "i want to archive this content" and click "save". it will take a few minutes to load, and sometimes you'll get a message with a preview because someone else has already archived the same page. you can either continue with uploading your image (updating the archive), or you can just use what's there. either way, when the image renders, ctrl+c the address bar for the archive link.

to add twitter and/or archive links to your posts, use the "insert link" button (or ctrl+k) at the top of the post box. it's the little chain links icon. ctrl+v the the address in the top box, and your text in the bottom box.

Baraadmirer · Mar 18, 2023

.iota. said:
- https://archive.org/ (aka "internet archive" or "wayback machine")

Just as a heads-up, there have been certain domains that have been blacklisted from being archived, so I'd use this one as a last resort.

.iota. said:
- https://archive.ph/

If you're on Chrome you can install the Archive Page extension, which archives to archive.today with the click of a button.

Toolbox · Mar 18, 2023

@.iota. Archive.org is a decent source to look for existing archives but unless you're saving on multiple sites I wouldn't bother using it to archive something. They have a forum for anyone to ask for content to be removed and during the keffals shitshow entirely removed sets of archived content because trannies asked. Or more accurately demanded.

Baraadmirer said:
Just as a heads-up, there have been certain domains that have been blacklisted from being archived, so I'd use this one as a last resort.

The farms is included in that blacklist isn't it?

Geranium · Mar 18, 2023

Also, regarding archive.org, it’s worth knowing that if a website is no longer available online but has been archived at archive.org, you can archive that into archive.today and it treats it as if you’d submitted the original website (but pulls the resources from archive.org).

Markass the Worst · Mar 18, 2023

Toolbox said:
@.iota. Archive.org is a decent source to look for existing archives but unless you're saving on multiple sites I wouldn't bother using it to archive something. They have a forum for anyone to ask for content to be removed and during the keffals shitshow entirely removed sets of archived content because trannies asked. Or more accurately demanded.

Geranium said:
Also, regarding archive.org, it’s worth knowing that if a website is no longer available online but has been archived at archive.org, you can archive that into archive.today and it treats it as if you’d submitted the original website (but pulls the resources from archive.org).

Yup that's the only good thing about archive.org, it can be archived straight through to archive.today. Google web cache content can also be archived straight through to archive.today.

Baraadmirer said:
Just as a heads-up, there have been certain domains that have been blacklisted from being archived, so I'd use this one as a last resort.

Toolbox said:
The farms is included in that blacklist isn't it?

Yup and it's more than just "certain domains". There are almost two thousand domains blocked and that's not even counting the excluded twitter accounts if your uncle happens to work at the Internet Archive.

Toolbox · Mar 18, 2023

Markass the Worst said:
Yup that's the only good thing about archive.org, it can be archived straight through to archive.today. Google web cache content can also be archived straight through to archive.today.

Yup and it's more than just "certain domains". There are almost two thousand domains blocked and that's not even counting the excluded twitter accounts if your uncle happens to work at the Internet Archive.

That is way more that's been excluded than I expected. Noticed sanctioned suicide is on there, is that recent?

Markass the Worst · Mar 18, 2023

Toolbox said:
That is way more that's been excluded than I expected. Noticed sanctioned suicide is on there, is that recent?

I have no idea when they got blocked. I found out when I was writing the OP and made a contribution to the list myself. In any case their newest domains (sanctioned-suicide.net and sanctionedsuicide.site) aren't excluded yet.

The Mass Shooter Ron Soye · Mar 18, 2023

Markass the Worst said:
Google web cache content can also be archived straight through to archive.today.

I used that trick extensively after Efrem Zelony-Mindell was scrubbed from the Internet:

https://kiwifarms.st/threads/non-bi...ing-child-abuse-material.140898/post-13941105

https://kiwifarms.st/threads/non-bi...ing-child-abuse-material.140898/post-13945545

https://kiwifarms.st/threads/non-bi...ing-child-abuse-material.140898/post-13948152

https://kiwifarms.st/threads/non-bi...ing-child-abuse-material.140898/post-13950621

I haven't figured out if/when Google web cache of a deleted page gets deleted. Is it around for 1 month or indefinitely? I guess if it were to drop out of search results that could be a problem, but if the URL is known it can be checked with a generic query (one of the options on archive.today).

.iota. · Mar 28, 2023

cross-posting this from the ftm shooting thread because i encountered some difficulty while trying to archive twitter threads about the shooting. i expected it from archive.org, but not from ghost archive.

Geranium · Mar 28, 2023

.iota. said:
cross-posting this from the ftm shooting thread because i encountered some difficulty while trying to archive twitter threads about the shooting. i expected it from archive.org, but not from ghost archive.

I wonder if that's a Twitter problem. I occasionally see archive.today show a mostly blank Twitter UI with a button that says "reload", and it's not just been high-profile tweets.

notafederalagent · Mar 31, 2023

Geranium said:
I wonder if that's a Twitter problem. I occasionally see archive.today show a mostly blank Twitter UI with a button that says "reload", and it's not just been high-profile tweets.

I'm thinking it's some kind of rate limiting kicking in on twitter's side because I've run into it more often in the last couple of months. They're definitely making some changes to the codebase as seen in issue #760 of snscrape, requiring a bump in the useragent string to get it functioning properly again.

HarblMcDavid · Apr 9, 2023

Crossposting from the Anisa thread since for noobs, setting up a semi-automatic stream recording setup for elusive content creators may seem daunting.

If you already know what streamlink is, dig this code:

Code:

streamlink.exe -o drive:\path\to\file\{id}-{time:%Y%m%d%H%M%S}.ts --twitch-disable-ads --retry-streams 30 --retry-max 0 https://www.twitch.tv/streamer-url best

When run from a command line this will hawk a twitch URL and check every 30 seconds for a stream, and once found, begin recording it, skipping ads.

If you don't know what streamlink is, allow me to help:

https://streamlink.github.io/index.html

1) Download the release. See https://streamlink.github.io/install.html for where to find it on your platform.

2) Extract the release somewhere, I put it on a secondary drive.

3) Find the "bin" folder withing the location you extracted it to, this will be where the executable lives.

4) With a command line (search "CMD" in windows, choose "command prompt"), make your way to this folder by changing drives (if necessary) and "cd"ing into the "bin" folder.
Changing drives, to do so type the drive letter and a colon, then hit enter. You should see the text on the left of the cmd window change to the drive you just entered. For example:

Code:

E:

Hit enter and assuming you have an E drive the text on the left will become E:\> indicating you are now on the E drive.
To move between folders, use cd , for example:

Code:

cd streamlink\bin

If there is a directory called "streamlink" from wherever you execute that command, would attempt to change where the terminal is running from to be within the "streamlink" folder and then the "bin" folder within the "streamlink" folder. This is not as complex as it sounds since you can use the "tab" key to cycle between available locations with cd, for example if I'm on the E drive, and there's a twitch folder on E, if I type cd (note the space after "cd") and then hit tab, it will cycle through all available items on E. If I type a letter after cd, like so cd s and then hit tab it will cycle through available options starting with s.

In windows, this is not case sensitive, almost all other platforms will be case sensitive.

Once you have the correct directory selected with tab, you can type a "\", and the "\" will denote to start looking in the folder behind the "\" (so in this example the "streamlink" folder), in which case tab would find the example "bin" folder next since that, for the purposes of this example, is the only folder in the "streamlink" folder.

Being in the right folder is important as this will determine how hard it is to run commands, in the case of the code I'm providing, it assumes the CMD is sitting in the streamlink\bin folder which is where streamlink.exe resides, so if you aren't sitting in that folder the code won't work quite right.

5) Once you have the CMD window in the streamlink\bin folder, the code above will work, but make sure you set the absolute path to the file to a folder that actually exists, so continuing the example, let's say you want to store the recordings in a theoretical "E $:\$ twitch\harleyplays" folder; make sure that the theoretical E:\twitch\harleyplays exists, if not, make those folders before you start playing around with the commands. The hypothetical path to the file for the above command would look something like this:

Code:

-o E:\twitch\harleyplays\{id}-{time:%Y%m%d%H%M%S}.ts

That bizarre shit in front of the ".ts" is replacement values, so it should generate a file that is named like "twitchId-yearmonthdayhourminutesecond.ts", obviously with those time based values being based on when the command executes. This will prevent accidental overwriting of previous recordings.

Note that after it records a stream, you will need to run the command again to get the watching set up since the command terminates upon finishing a recording.

In windows, you hit "up" on the keyboard while in the CMD after the command finishes and you should see it fill the last command you entered from history, hit enter and it will start stalking the URL again.

A potential solution to the automatically start the command again problem was proposed (actual command edited since I know people tend to copy/paste without modifying):

Battlecruiser3000ad said:
You could put it into a .bat file

Code:

:START streamlink.exe -o drive:\path\to\file\{id}-{time:%Y%m%d%H%M%S}.ts --twitch-disable-ads --retry-streams 30 --retry-max 0 https://www.twitch.tv/streamer-url best GOTO START

notafederalagent · Apr 10, 2023

I've mentioned it elsewhere, but chat-downloader works great to grab youtube and twitch chats:

https://github.com/xenova/chat-downloader

chat_downloader "twitch URL" --message_groups "messages bans deleted_messages hosts room_states user_states notices chants other bits subscriptions upgrades raids rituals mods colours commercials vips charity" --output outputfilename.json | tee -a outputfilename.txt

chat_downloader "youtube URL" --message_groups "messages superchat tickers banners donations engagement purchases mode_changes deleted bans placeholder" --output outputfilename.json | tee -a outputfilename.txt

Toolbox · Apr 19, 2023

With the them of livestream archiving, is there a tool like open video downloader that can clip a long archived stream rather than downloading the whole thing? As in you select a timestamp and where it should end and that is all that is saved. I'm talking specifically in the realm of Youtube.

Archival Tools - How to archive anything.

Markass the Worst

don't do stance, kids

Toolbox

Trusted the PlQn

Markass the Worst

don't do stance, kids

Quote Cursor

Mostly Inactive

notafederalagent

pinky promise

Wau

.iota.

awful, awful, awful

Baraadmirer

💪🍦💪

Toolbox

Trusted the PlQn

Geranium

Kincora; Dutroux; Epstein

Markass the Worst

don't do stance, kids

Toolbox

Trusted the PlQn

Markass the Worst

don't do stance, kids

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

.iota.

awful, awful, awful

Geranium

Kincora; Dutroux; Epstein

notafederalagent

pinky promise

HarblMcDavid

in ur zone, dekin' my harbl

notafederalagent

pinky promise

Toolbox

Trusted the PlQn