Archival Tools - How to archive anything.

James Smith · Feb 22, 2017

youtube-dl is available for Windows too.

Field Marshal Crappenberg · May 3, 2017

I did a lot of video archiving last week, and thus have some experience that I want to impart. Archiving videos and audio files is much more complicated than archiving a simple webpage. Most of this was from my post in the Michael/DaddyOfFive thread, though I've made edits and additions to this version.

Summarized Process:

1.) Use a converter to download a video/audio submission on YouTube and the like. OnlineVideoConverter is my default choice for simple conversions. 123Apps hosts a converter and various editing tools for audio and video.
2.) Upload the files to one or more publicly accessible repositories- the more the better. DailyMotion and Sendvid are decent and relatively simple. The Wayback Machine archives videos as well as pages.
3.) Include links to the original source and all other backups in the descriptions for the files. Use proper tagging and naming.
4.) Promote all copies and progress in appropriate places such as KF, Telegram/Discord drama groups, etc.

When you see a video you want to save on YouTube, you have to use a converter to do this (or at least I do- right-click and Save isn't an option). I use https://www.onlinevideoconverter.com/video-converter by default. https://123apps.com/ provides multiple functions for both audio and video files and submissions, so if editing or clipping is required for whatever reason I use that. There are other converters as well. Copy-paste the link into the converter etc, and then you are able to download the file onto your PC.

If you're dealing with prohibitively large videos and circumstances are suitable, you can use the converter to download a lower resolution version or an audio version instead. It would be ideal to make and upload audio versions of suitable videos (there isn't a need to see the person- these lolcows' videos wouldn't fall in that category) anyway so people will have an easier method of downloading/storing/playing them.

Having the file is nice, but it's much better to upload it to a fully publicly accessible archive of some sort- preferably multiple. For the Foxler videos, I relied on YouTube (you have to make a channel first if you don't already have one) and SendVid as archives (plus a Telegram channel). There are other submission sites such as DailyMotion. The Wayback Machine also hosts videos, but they seem amenable to takedown requests, especially DCMA/copyright ones. I would recommend trying for three archives of each file ultimately, and using at least one service that is resistant to removing things (Lolcow.wiki would be a good place since it's ours, but the file limit is 28 MB IIRC).

I opened a Notepad file and entered the file names and all URLs associated with them as discovery and uploading happened. After submitting them to all the sites, I copy-pasted the URLs for each file into the descriptions of all copies, and figured out what tags to use. After I finished uploading entirely, I copy-pasted the whole document into a post in the Foxler thread and made needed tweaks for functionality and appearance. I found this process of entering things in a document and entering full descriptions after I had all the URLs to be efficient and effective.

tehpope · May 3, 2017

When I archieved all those Do5 vids, I just used JDownloader. Yeah, it uses java, but I don't really have any issues with it, even on my laptop from 2011. It can archive the video, extract the audio, pick the highest quality version of said video, download the description, and download the thumbnail. It can also sort the videos into individual folders, crawl through playlists and channels.

Just make sure to go to the plugins page in the settings and click on youtube.com.

A Robin · May 20, 2017

(HIDEOUS 2019 EDIT: This post is kinda sad to me now and no one cares probably, but I'll leave it intact. Opening multiple URLs for archive.today leads to IP blocking annoyances (give like 20 seconds between each archive, I saw someone write), but I still rely on it. I don't use that (actually kinda annoying) Link Klipper thing at all and really a bookmarklet I found and learning regex sufficed. At some point Notepad++ became my new crutch instead of annoying spreadsheet programs. Maybe by 2020 I'll have started automating things with curl or something and actually know stuff and people who were absolute archiving noobs like I was 2 years ago won't understand what I'm saying anymore, lol.)
---
I guess people here may already have their own, better ways for archiving webpages in bulk in archive.is, but since nothing about that was really mentioned here I thought I'd share things I use in case it's not commonly done.

Two nifty extensions in Chrome I've found to help me are Open Multiple URLs and Link Klipper. The first does exactly what it says: you paste a list of URLs into it, click a button, and it will open all those links at once. It also has the feature of extracting URLs from text, which is handy sometimes. Link Klipper is a great extension that allows you to click and drag a box on a webpage to select links on it and it will save them as a list in .csv or .txt format. It can also just extract all of the links on a webpage and save them for you. From there you can use Microsoft Excel or Google Sheets or whatever to append 'archive.is/?url=' to every link on the list, copy these links, paste them into Open Multiple URLs, open them, and then Ctrl+Tab to go through each tab and hit Enter on each of them to start saving them. (Note that if the URL has an ampersand (&), a hash mark (#), or some other certain character doing this won't work since it and everything that comes after it will be deleted; you'll have to ~~just add 'archive.is/' to those links and click through on each one to save them.~~ replace these characters with percent-encoded ones.)

I realize my methods may be convoluted and barbaric but I have little computer knowledge :oops:

Hope it's helpful to someone though. I've archived a stupid number of things this way.

2k18 edit: I'm dumb despite using the bookmarklet this whole time; using 'archive.is/?run=1&url=' does it automatically. Watch for the special characters though. And I'll also add this page of archiving bookmarklets.

Hackerman · Aug 22, 2017

Shutter on GNU/Linux has served me pretty well. It can take screenshots of entire webpages too.

Euler2718 · Jan 17, 2018

GrepTweet is down at the present time

XYZpdq · Jan 18, 2018

I used to like DownThemAll for getting all the pictures linked in a page. Any suggestions for a replacement since FireFox borked it?

fartsnstuf · Jan 27, 2018

got anything that can cap entire 4chan pages?

I'm on Windows 10 if that can help as well

PanicAct · Feb 4, 2018

Abilene said:
DVDVideoSoft, 4kDownload and JDownloader are superb tools to download Youtube videos, along with other websites like blip, dailymotion, vimeo, etc.

Also check out freemake - it can do basically any website: Youtube, vimeo, porn sites etc

Wärring Ornac · Feb 15, 2018

So what are the difference between archive.is, archive.fo, and archive.li?

Or are they just the same?

AnOminous · Mar 4, 2018

Wärring Ornac said:
So what are the difference between archive.is, archive.fo, and archive.li?

Or are they just the same?

I'm pretty sure they're just there for load-balancing and as backup if one or more of them go down.

Feline Darkmage · Mar 5, 2018

Wärring Ornac said:
So what are the difference between archive.is, archive.fo, and archive.li?

Or are they just the same?

They're all the same and are part of archive.today. The different site names exist for the reason @AnOminous described in his post above.

Wärring Ornac · Mar 14, 2018

AnOminous said:
I'm pretty sure they're just there for load-balancing and as backup if one or more of them go down.

Feline Darkmage said:
They're all the same and are part of archive.today. The different site names exist for the reason @AnOminous described in his post above.

Ah, I get it now

Speaking of archive.today, I advised anyone who failed to archived a page with it to use via.hypothes.is
Example:
https://mastodon.xyz/@donleykogn

hypothetically speaking this shouldnt be happening.png

Now if you use via.hypothes.is : https://via.hypothes.is/https://mastodon.xyz/@donleykogn

and now my hypothesis is proven to be the correct one.png

EDIT: So apparently if you try to archive deviantart with archive.today, this is what you got:

shit its failing, i need more VIAl of water.png

I've tried this with several blog, but the result is still the same. But it turns out that using via.hypothes.is is also a solution to this problem:

James Smith · Apr 20, 2018

For the record:

http://archive.is/At8EY is a link to a somewhat interactive archive of a page.

You get a screenshot small screenshot if you add /scr.png to the end:

It looks like shit, and it's 1024x768, but it may save you the effort of pulling up a screenshot tool and/or image editor on your machine.

awoo · Apr 23, 2018

Abilene said:
DVDVideoSoft, 4kDownload and JDownloader are superb tools to download Youtube videos, along with other websites like blip, dailymotion, vimeo, etc.

Field Marshal Crappenberg said:
I did a lot of video archiving last week, and thus have some experience that I want to impart. Archiving videos and audio files is much more complicated than archiving a simple webpage. Most of this was from my post in the Michael/DaddyOfFive thread, though I've made edits and additions to this version.

Summarized Process:

1.) Use a converter to download a video/audio submission on YouTube and the like. OnlineVideoConverter is my default choice for simple conversions. 123Apps hosts a converter and various editing tools for audio and video.
2.) Upload the files to one or more publicly accessible repositories- the more the better. DailyMotion and Sendvid are decent and relatively simple. The Wayback Machine archives videos as well as pages.
3.) Include links to the original source and all other backups in the descriptions for the files. Use proper tagging and naming.
4.) Promote all copies and progress in appropriate places such as KF, Telegram/Discord drama groups, etc.

When you see a video you want to save on YouTube, you have to use a converter to do this (or at least I do- right-click and Save isn't an option). I use https://www.onlinevideoconverter.com/video-converter by default. https://123apps.com/ provides multiple functions for both audio and video files and submissions, so if editing or clipping is required for whatever reason I use that. There are other converters as well. Copy-paste the link into the converter etc, and then you are able to download the file onto your PC.

If you're dealing with prohibitively large videos and circumstances are suitable, you can use the converter to download a lower resolution version or an audio version instead. It would be ideal to make and upload audio versions of suitable videos (there isn't a need to see the person- these lolcows' videos wouldn't fall in that category) anyway so people will have an easier method of downloading/storing/playing them.

Having the file is nice, but it's much better to upload it to a fully publicly accessible archive of some sort- preferably multiple. For the Foxler videos, I relied on YouTube (you have to make a channel first if you don't already have one) and SendVid as archives (plus a Telegram channel). There are other submission sites such as DailyMotion. The Wayback Machine also hosts videos, but they seem amenable to takedown requests, especially DCMA/copyright ones. I would recommend trying for three archives of each file ultimately, and using at least one service that is resistant to removing things (Lolcow.wiki would be a good place since it's ours, but the file limit is 28 MB IIRC).

I opened a Notepad file and entered the file names and all URLs associated with them as discovery and uploading happened. After submitting them to all the sites, I copy-pasted the URLs for each file into the descriptions of all copies, and figured out what tags to use. After I finished uploading entirely, I copy-pasted the whole document into a post in the Foxler thread and made needed tweaks for functionality and appearance. I found this process of entering things in a document and entering full descriptions after I had all the URLs to be efficient and effective.

Maybe I'm missing something obvious, but how are these in any way better than youtube-dl? I would bet most/all use youtube-dl as a backend. Additionally youtube-dl doesn't require an external resource, is highly flexible, and is updated very frequently.

I discovered today that the program also has an option to download autogenerated subtitles, which makes for good text searching. https://github.com/rg3/youtube-dl/issues/3838

Example: 1 hour Narcissa Wright stream rambling, transcribed
https://github.com/glut23/webvtt-py converts .vtt files to plaintext.

chadmuska · Apr 25, 2018

For tumblrs that are set to the "registered accounts only" mode (the blog only shows to the right side of your dashboard when you enter their url) you had difficulty in regards to being able to archive them. An exploit was found where you can view and archive posts by simply adding /embed to the end of a posts permalink.

There may come a point where this is no longer doable, and its pretty concerning as it means cows could easily cover their ass at the flick of a switch. If this were to happen, via.hypothes.is does work but only on archive.is afaik. You can try gramunion or tumbex to view a post and try to archive it, or you can archive a reblog of the post in question. Obviously that last method is a YMMV thing so don't rely on it all the time.

Wärring Ornac · Apr 26, 2018

chadmuska said:
For tumblrs that are set to the "registered accounts only" mode (the blog only shows to the right side of your dashboard when you enter their url) you had difficulty in regards to being able to archive them. An exploit was found where you were able to view and archive posts by simply adding /embed to the end of a posts permalink.

Unfortunately, Tumblr added a robots.txt over the past week. In other words, this method will not work anymore. You will now just get an infinite loop on archive.is and a robots.txt error on wayback machine.

This is pretty concerning as it means cows can now easily cover their ass at the flick of a switch. via.hypothes.is does work but only on archive.is and its painfully slow (I was able to write a large chunk of this post while it was doing its magic). The old method (finding a reblog and archiving that) also works but it obviously requires people to interact with the post. This isn't too big of a problem for one of the cows I follow who has a medium-large youtube following, but it goes without saying: your mileage will vary.

Maybe we can archive it with the help of some third-party tumblr viewer site like gramunion or tumbex? Some of them do bypass the safe mode

chadmuska · Apr 26, 2018

Wärring Ornac said:
Maybe we can archive it with the help of some third-party tumblr viewer site like gramunion or tumbex? Some of them do bypass the safe mode

I'll give them a try. Seems like I was actually wrong and it's just archive.is having a day long hiccup. I'll edit my post though to include your methods as a precaution.

Wärring Ornac · Apr 27, 2018

chadmuska said:
I'll give them a try. Seems like I was actually wrong and it's just archive.is having a day long hiccup. I'll edit my post though to include your methods as a precaution.

Now that I think about it, didn't archive.today bypass robot.txt entirely? Isn't that the selling point of it during the whole GG era back then?

awoo · May 6, 2018

I don't have any experience in archiving really but I've been tinkering with using youtube-dl to download entire channels, similar to this https://archive.org/details/TerryADavisTempleOSYouTubeChannelArchive

Here is the current command I'm using (downloads all from channel):

Code:

youtube-dl -f best --playlist-reverse -o '%(playlist_index)03d %(upload_date)s %(title)s (%(id)s).%(ext)s' --download-archive downloaded.txt https://www.youtube.com/channel/UCJZTn-fPu-uIA55UI47_cXg

~~Normally I'd use bestvideo+bestaudio for individual videos, but I don't want to burden my internet too much, and in 99% of these videos the audio is more important than looking at Terry's face.~~
I've been having problems with bandwidth throttling (I think server-side) and I believe -f best is the fastest, aside from restarting every few seconds. https://github.com/rg3/youtube-dl/issues/6923#issuecomment-359150041
Terry has many videos uploaded with the same name on the same date. Since youtube-dl doesn't seem to provide an uploaded time option, I've stuck with the video's playlist index in reverse order (oldest to newest). For channels that upload with unique names, this should not be a problem.

I may even schedule a cronjob to do this regularly. I welcome any suggestions about my command (youtube-dl has a lot of flags) or archiving.

Archival Tools - How to archive anything.

James Smith

Field Marshal Crappenberg

Marshal of the Latrines

tehpope

The Far-Out Son of Lung

Attachments

A Robin

Hackerman

installed kali and can't get windows vista back

Euler2718

XYZpdq

fbi most wanted sskealeaton

fartsnstuf

From: The Oval Office

PanicAct

Wärring Ornac

But it was never the streets that were evil.

AnOminous

NERDS! NERDS! NERDS! NERDS! NERDS!

Feline Darkmage

Gamer Gril Queen

Wärring Ornac

But it was never the streets that were evil.

James Smith

awoo

Please be patient, I have awootism

chadmuska

Going to the store to get some cigars

Wärring Ornac

But it was never the streets that were evil.

chadmuska

Going to the store to get some cigars

Wärring Ornac

But it was never the streets that were evil.

awoo

Please be patient, I have awootism