- Joined
- Sep 10, 2021
True, but ultimately it's your metal, you control practically everything about it.
Coincidentally enough I found the Raspberry Pi 4 I bought during Covid that went unused and started to migrate from the AWS Lightsail instance
Currently running a 40TB setup composed of 8tb drives in a snapraid+mergerfs array with two extra drives providing parity tied to an archivebox instance and several scripts to archive stuff regularly. Works well and if a drive fails it only takes 12 hours to get the array back online with less than a day's data loss, which has happened once in a decade due to everything only being spun up a few hours a day at most.
If anyone knows a better mediawiki archiving solution than mwoffliner let me know. It fucking sucks and one malformed page will crash the whole task. I want dumps of ED and other weird culture wikis before they're gone.
Trivial ghetto YouTube archiving solution I like : make a hidden public playlist on your account and run a ten-minute cronjob to save said playlist with ytdlp. Makes it easy to grab videos remotely without actually exposing any services by just throwing them on the playlist.
Maybe this list might help:
iipc/awesome-web-archiving
An Awesome List for getting started with web archiving - iipc/awesome-web-archiving