Offline Long-Term Digital Archival - Archiving data for when the Internet cannot be depended on.

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
What's with the tiny case?
View attachment 6152652
Although, I have the 20 drive version, lets you slap a pair of 2.5" drives on top.
I have a main array in a 12 drive case, all identical drives, RAID6, etc.
The giant box is the backup server. All random drives with mergerfs and snapraid. As the primary array gets upgraded then drives migrate to the backup server, and smaller drives go away.

Backup-backups are 2 drives in a pelican case that gets stored at my storage unit. And swapped with a second set of 2 drives periodically. That's just my content, not downloaded stuff.
Thats a really interesting use for one of the super many drives cases I didn't think of that
 
Ytdl-sub is very good for automating the download of and generation of metadata for any shit on youtube that you might want to archive. It's been working very well from what I've experienced with it so far. There's a bunch of plugins for chapter embedding, sponsorblock segments, and naturally you can schedule it to download anything that's newly uploaded to any channel you choose. It also works with various media server solutions such as Jellyfin and Plex along with others
 
Last edited by a moderator:
Bash:
#!/bin/bash
a=$1
b=${a##*@}
mkdir "${b,,}"
cd "${b,,}"
yt-dlp --download-archive archive.txt --sponsorblock-remove sponsor $a
cd ..
echo "Done!"

Quick and dirty, archives entire youtube channels and skips sponsors.
I'm curious, could yt-dlp somehow run sponsorblock on already downloaded videos based on the ID automatically appended to the filename?
The program is rediculously versatile so it doesn't seem too much of a pipe dream.
 
I'm curious, could yt-dlp somehow run sponsorblock on already downloaded videos based on the ID automatically appended to the filename?
The program is rediculously versatile so it doesn't seem too much of a pipe dream.
A quick glance over the sponsorblock API docs suggests to me that it is possible. Get the Video ID from the filename, retrieve the sponsor segments and then do some funnybusiness with ffmpeg to cut those parts out. It's a bit beyond my ability level but I can see it.
 
Can I ask you digital archive guys, how are you accounting for things like equipment failure maybe not of the disks but processors, ram, motherboards PSU's etc - I mean are you buying an storing spare parts? Are you repairing your equipment etc?

I mean if something fucky does happen your not going to be carting a 4U server around with you, and even if you could how would you power it? Hell how are you going to access the Data?
 
I mean if something fucky does happen your not going to be carting a 4U server around with you, and even if you could how would you power it? Hell how are you going to access the Data?
Um, yes, that's the reason it's in a 4u case. Grab it, toss it in the back of the car, leave.

My primary use case is not destruction of civilization but things like forest fire or earthquake, where I'm fine, but local civilization may be destroyed so I can go somewhere else instead.
 
Um, yes, that's the reason it's in a 4u case. Grab it, toss it in the back of the car, leave.

My primary use case is not destruction of civilization but things like forest fire or earthquake, where I'm fine, but local civilization may be destroyed so I can go somewhere else instead.

You see I can get it as a short term thing but for long term I'm thinking about practicality, under ideal conditions or at an outside stable but extreme conditions tech can last in a stable state but all technology has a life span especially electrical components we rely on and that's something like 20 years before even minor failures are not easy to recover from and the newer a technology is it's harder to maintain not by design but by how it's manufactured.

I mean it doesn't have to be an apocalyptical situation, it cold be something like a world war, energy shortage, pandemic, etc Right now we are living in a time with global spanning webs of supply are something we take for granted - but it doesn't take much to disrupt them and a day or so's disruption is built into the network but the longer it carrys on the effects cascade, hell the ship that blocked the Sues cannal has been treated as a dark example of an IRL network failure by logistics students and that was a single accident that had knock on effects across the world, imagine if people where actively trying to engineer that situation shit would fall apart FAST.
 
Can I ask you digital archive guys, how are you accounting for things like equipment failure maybe not of the disks but processors, ram, motherboards PSU's etc - I mean are you buying an storing spare parts? Are you repairing your equipment etc?

I mean if something fucky does happen your not going to be carting a 4U server around with you, and even if you could how would you power it? Hell how are you going to access the Data?
Well, in most situations a damaged computer component is toast. It’s possible to repair, but for most components not feasible, especially if the damage is not one or two small external breaks in the circuit or a detached component. Especially as a fair few components are black boxes. It takes a certain level of skill to repair a failing capacitor (which is the most maintenance needed for most systems), but if there is complicated damage to the CPU itself or the motherboard PCB? Not worth it, even in SHTF. Just replace it and maybe take it to a dedicated repair technician ala Louis Rossmann.

Most of my “server” hardware is an array of cheap disks from various sources (mostly BestBuy’s easystores that were <$15/TB) and a few cheap work desktops moved into larger cases. I don’t do much computing on them, and what I do (some game servers, encoding BD media for local streaming) normally finishes within the day.

If something fucky happens most of my data or hardware is not worth my life saving, it’s mostly easily torrented GNU/Linux ISOs and bargain bin processors. The important data is on a few BDs/USB sticks/SD cards I check regularly (bout every quarter) for damage, and if I gotta get the Hell outta Dodge it’s all there for my bugout bag.

My primary use case is not destruction of civilization but things like forest fire or earthquake, where I'm fine, but local civilization may be destroyed so I can go somewhere else instead.
This. Most preppers have been tainted by a mixture of zombie shows, Mad Max, and that canadian faggot into thinking that there would be historical civilization collapse. In reality it’s far more likely one gets hit by a local disaster such as weather before we hit anything else. The most common problem here is power outages. For a situation like that, I am most interested in upgrading my setup and building one or two low power servers I can efficiently run without the need of the grid. The next biggest problems are floods or tornadoes, both of which would consist of me ideally having forewarning and being able to hunker down/pack up depending on the situation.
 
Well, in most situations a damaged computer component is toast. It’s possible to repair, but for most components not feasible, especially if the damage is not one or two small external breaks in the circuit or a detached component. Especially as a fair few components are black boxes. It takes a certain level of skill to repair a failing capacitor (which is the most maintenance needed for most systems), but if there is complicated damage to the CPU itself or the motherboard PCB? Not worth it, even in SHTF. Just replace it and maybe take it to a dedicated repair technician ala Louis Rossmann.

Most of my “server” hardware is an array of cheap disks from various sources (mostly BestBuy’s easystores that were <$15/TB) and a few cheap work desktops moved into larger cases. I don’t do much computing on them, and what I do (some game servers, encoding BD media for local streaming) normally finishes within the day.

If something fucky happens most of my data or hardware is not worth my life saving, it’s mostly easily torrented GNU/Linux ISOs and bargain bin processors. The important data is on a few BDs/USB sticks/SD cards I check regularly (bout every quarter) for damage, and if I gotta get the Hell outta Dodge it’s all there for my bugout bag.

Right thank you after years of asking that's the most honest and legitimate answer I have ever got, I have a large Physical libary and a larger digital library but everything truly important I have committed to memory and I also do have a critical multi media backup that I update as an when, but it's ment to be something I access maybe once a decade to update.
 
  • Feels
Reactions: Casshern
If you're assuming no electricity and no repair parts you're likely back to the old standbys.

Analog.

Looks like Microfilm or Microfiche using normal Silver Halide imaging with a Polyester base is likely a good long term medium.
Possibly using Platinum Palladium instead.

Basically if you can get glass and grind a lens or three then you can read it.
 
  • Wikipedia has a whole page on how to dowload wikipedia (Though I'm pretty sure most of wikipedia is completely useless)
  • HTTrack can be used (with varying levels of success) to archive websites
  • WikiTeam is a tool for archiving wikis. Can be used to download Wiki dumps (Wikipedia, Wikibooks, Wikinews, etc)
  • Kiwix is another offline wiki tool. Has smaller file sizes than the dumps released by the wikimedia foundation, and includes WikiHow, iFixit, Khan Academy, Stack Overflow, Ted Talks, WikiMed, among a bunch of others. You can easily pick and choose which wikis to download through a ui. Can also be run as a server.
  • You can also download text-only dumps of those xyz.fandom.com wikis. I guess if you just really need 40k lore in the apocalypse?
 
My idea is not the end of the world, but when the Internet becomes so censorious that you can no longer access these things, where it is completely erased aside from your archive.
I think this is a big possiblity. Get on the I2P bandwagon:

A friend of mine made torrents out of my MATI archive, including 2018-2023 missing only the lost life is strange episodes. Let me know if I missed anything.
He only seeds it over I2P for whatever reason and I can't reliably seed anything, sadly this is the best I can do.
Attached is the torrents in case you want!
Your friend is absolutely giga-based and I will be seeding these. IMO i2p is the endgame for torrenting because with no risk to seeding there is no reason to not seed everything you've downloaded indefinitely. Combine this with a DHT scraper and even if your tracker got fucked everything could still be found. The i2psnark devs need to add functionality to move already downloaded torrents already.

In that situation you may want OnionShare, which will allow you to discreetly share said censored content.
Onionshare is also based as fuck because it's a direct transfer that just werks because all the routing is taken care of by TOR. I wish the tor project would put some resources into it (and onionchat) because it has the potential to be something that more convenient than alternatives. Need to quickly share a 250mb file with one person? Email won't do it, setting up sftp or something is a pain (and might be impossible if behind CGNAT), there's mega and shit but then they have a copy but onionshare? just fuckin' werks.

I'm curious, could yt-dlp somehow run sponsorblock on already downloaded videos based on the ID automatically appended to the filename?
The program is rediculously versatile so it doesn't seem too much of a pipe dream.
I'm pretty sure yt-dlp already has sponsorblock functionality. You can have it just mark the segments from sponsorblock or even auto-remove them from the ffmpeg job to merge the files. It should be in the docs.

I've recently begun pirating and formatting (transcoding, subbing and fixing the metadata) many shows and movies I'd like to preserve so I can talk a little bit about that.

Opus really is the endgame audio codec beating any other codec over a wide range of bitrates and having very low latency, however, I've chosen to use AAC instead since it's a lot older and enjoys more support, for example, Windows media player can't play MP4 files with opus audio.
View attachment 6149012
Regarding video codecs, I've chosen to use AV1 as it's FOSS, enjoys wide support and is the second most efficient video codec behind VVC/h266 which nobody seems to care about. HEVC/h265 requires a paid decoder to be played on Windows and isn't well supported so no thank you.
I'm sorry but anyone intentionally using windows media player, buying a h265 codec from the windows media store (meaning you have an online MS account(!) and gave them your credit card(!!)) or worse making archival decisions based on windows media player is an utter fucking retard. Ideally you wouldn't be re-encoding things with lossy compression but if you need to to save space you should go with the most efficent codec available. Support for older devices can easily be provided by live-transcoding to older codecs from your nas/plex/jellyfin/whatever.

For "on-grid" data backups, I currently have two direct attached storage devices, one QNAP TR-004 with 8 TB HDDs setup in a RAID 5 config. This gives me 24 TB of net storage.

The other is the Terramaster D8 Hybrid. I don't have all of the bays filled yet, but I'm running this as just individual drives. I'm not a huge fan of this one. If you want to use it for just individual drives / JBOD it's probably good to go, but it is definitely not meant for RAID config. It has 4 HDD bays, and 4 NVME slots. The NVME slots can't be run in RAID mode though and have slow read speeds, atleast based on my use.
Hot take: literally every "NAS appliance" is a scam. Just looking at this is laughable. "Micro processor with hardware RAID" (aka some under powered arm garbage) ? "NAS storage (EXT4)External hard drive (EXT3, EXT4, NTFS, FAT32, HFS+, and exFAT) Note: To use exFAT QNAP NAS you must purchase an exFAT driver license from License Center." or in other words "no checksumming filesystem and we're going to make you pay extra for exFAT for some bizzare reason". What happens when that breaks? Do you have to buy another thing from qnap instead of just moving the drives to another PC? What if they don't sell them anymore? Hardware raid on top of a filesystem without checksums? And this shit is 220$?

Get a used optiplex out of the trash and a 30$ case that you can fit 8 harddrives in. Put linux on it, setup a btrfs array, install samba.

  • Wikipedia has a whole page on how to dowload wikipedia (Though I'm pretty sure most of wikipedia is completely useless)
  • HTTrack can be used (with varying levels of success) to archive websites
  • WikiTeam is a tool for archiving wikis. Can be used to download Wiki dumps (Wikipedia, Wikibooks, Wikinews, etc)
  • Kiwix is another offline wiki tool. Has smaller file sizes than the dumps released by the wikimedia foundation, and includes WikiHow, iFixit, Khan Academy, Stack Overflow, Ted Talks, WikiMed, among a bunch of others. You can easily pick and choose which wikis to download through a ui. Can also be run as a server.
  • You can also download text-only dumps of those xyz.fandom.com wikis. I guess if you just really need 40k lore in the apocalypse?
I had no idea I could get dumps of stack overflow! This is super helpful.
 
Hot take: literally every "NAS appliance" is a scam. Just looking at this is laughable. "Micro processor with hardware RAID" (aka some under powered arm garbage) ? "NAS storage (EXT4)External hard drive (EXT3, EXT4, NTFS, FAT32, HFS+, and exFAT) Note: To use exFAT QNAP NAS you must purchase an exFAT driver license from License Center." or in other words "no checksumming filesystem and we're going to make you pay extra for exFAT for some bizzare reason". What happens when that breaks? Do you have to buy another thing from qnap instead of just moving the drives to another PC? What if they don't sell them anymore? Hardware raid on top of a filesystem without checksums? And this shit is 220$?
are you ChatGPT? everything you said is just entirely made up bullshit. do you know the difference between a NAS (network attached storage) and a DAS (direct attached storage) device?
 
are you ChatGPT? everything you said is just entirely made up bullshit. do you know the difference between a NAS (network attached storage) and a DAS (direct attached storage) device?
No, or at least I don't think so. Those quotes (besides my sarcastic one after "in other words") are directly from the product page for the QNAP TR-004 that I linked.
 
  • Like
Reactions: IdiotPlusPlus
I'm sorry but anyone intentionally using windows media player, buying a h265 codec from the windows media store (meaning you have an online MS account(!) and gave them your credit card(!!))
I explicitly said I didn't buy the h265 codec.
or worse making archival decisions based on windows media player is an utter fucking retard.
So they should be based on what? The media player of my TV? VLC? FFplay? CD players? Firefox? I want shit to work out of the box for me and others.
Ideally you wouldn't be re-encoding things with lossy compression but if you need to to save space you should go with the most efficent codec available. Support for older devices can easily be provided by live-transcoding to older codecs from your nas/plex/jellyfin/whatever.
Lossless encoding demands a colossal amount of memory even with decent lossless codecs like FFV1 easily taking 10x what a placebo lossy encoding would take with obsolete codecs like h264, moreover, it's straight up pointless unless you have the uncompressed original source which I wager you don't because even Blue-Ray uses lossy codecs (h265 IIRC). The most efficient codec right now is VVC but you'd need to be a moron to use it right now, it has virtually no support and even fucking FFMPEG doesn't support VVC encoding.
Support for older devices can easily be provided by live-transcoding to older codecs from your nas/plex/jellyfin/whatever.
This shows you don't know what you're talking about. What you say would make sense if you merely decoded your media and then streamed the uncompressed video to the device in question, transcoding it adds an unnecesary step that only increases the computational demand and decreases fidelity because of generation loss.
 
I explicitly said I didn't buy the h265 codec.

So they should be based on what? The media player of my TV? VLC? FFplay? CD players? Firefox? I want shit to work out of the box for me and others.
I think people should be archiving things the the highest quality that their storage space allows for. I also think considering what windows media player can do at all is foolish.

Lossless encoding demands a colossal amount of memory even with decent lossless codecs like FFV1 easily taking 10x what a placebo lossy encoding would take with obsolete codecs like h264, moreover, it's straight up pointless unless you have the uncompressed original source which I wager you don't because even Blue-Ray uses lossy codecs (h265 IIRC). The most efficient codec right now is VVC but you'd need to be a moron to use it right now, it has virtually no support and even fucking FFMPEG doesn't support VVC encoding.
Who said anything about lossless compression? I am assuming you already have something that is using lossy encoding (like a blu-ray as you mention) and saying that re-encoding this with lossy compression again is a bad idea. You should be keeping it in the original format if possible.

This shows you don't know what you're talking about. What you say would make sense if you merely decoded your media and then streamed the uncompressed video to the device in question, transcoding it adds an unnecesary step that only increases the computational demand and decreases fidelity because of generation loss.
Au contraire this shows that it's you who doesn't know what they're talking about. First off nothing gets streamed as uncompressed video because of the enormous amount of bandwidth that would take. When you are streaming something from plex/jellyfin/whatever normally the compressed file (ex. your x265 blu-ray rip) is served unaltered assuming the device supports playback in that format (ex. x265). However, if you're streaming to something older that doesn't support the format the file is in it can be transcoded in real time to a format the older device does support: https://jellyfin.org/docs/general/server/transcoding/ https://redlib.northboot.xyz/r/jellyfin/comments/xidrap/do_you_enable_transcoding/
 
  • Like
Reactions: Cowboy Kim
Back