According to
Archiveteam, the Wayback Machine had 57 petabytes of data about a year ago, so let's make it 60 petabytes now. 60 petabytes is 60,000,000 gigabytes.
Backblaze says they pay about $0.014 US per gigabyte now, so that's $840,000. In practice it would be more, because you would need backplanes, and you would probably want cases for the drives with fans. So maybe a million dollars as a naive estimate and 2 million minimum when it's all said and done.
That's not as expensive as I was imagining. Makes me think there are already "dark" copies of a lot of the stuff out there that (formerly?) wealthy people are sitting on. The real problem comes in a generation or two when the data needs to be copied to new media. Storage cost is still going down but not as fast as it did in the past, and there's no reason to think it will continue to go down. It's not magic. Someone has to actually figure out how to cram more bits into the same space, and someone has to pay them to make the attempt in the first place.
Of course, you could lower the cost by only copying a selection of the data, but then you'd have a biased archive. I think around here, making an unbiased archive is one of the main motivations for even thinking about forking archive.org. You have to preserve the original concept of the Wayback Machine, which was a fully automated system that spidered the Web and saved everything it accessed. And you'd have to make it public, which, as I said earlier in this thread, invites troon antics. I mean, look around. If it's not troons, it's other degens fucking with you. So the cost of building a fork is only half the problem.
And there's yet another problem that I only fully understood after reading a bunch of the archive.today guy's posts: saving stuff from normie shitholes like Facebook and Reddit. Those sites have anti-bot code that is continuously being updated, so he has to keep making changes to his code to defeat their anti-bot measures. He has to pay for a lot of IP addresses, he has to tweak his software stack, and he relies on reports from users to know when normie shit isn't being saved correctly.
Having said all that, I'm nonetheless encouraged by what I see around here. If you believe in building an unbiased fork, do it. Don't make it public, at least not right away. Budget in the cost of copying everything to new media every generation. Maybe it's pointless, since you won't care when you're dead whether your archive is unbiased and survives, but then so is everything, right? If you enjoy saving shit then do it.