Forum Archival Post-Subpoena of archive.today - The forum is too reliant on archive.today, discussion for potential solutions

Lieutenant Rasczak

I'm doing my part
True & Honest Fan
kiwifarms.net
Joined
Dec 22, 2017
archivalandsheeeit.png

archivepic.png
Post-Subpoena Archival archivepic.png
As effortposters of the forum know or should know, a lot of material related to subjects of discussion on the forum is archived using archive.today. Whether you write threads, or are posting about a lolcow's antics, or effortposting in a thread, you likely use archive.today. If you're not in the know, as of recently archive.today is being served a subpoena by the FBI as a result of CSAM being archived as the result of archival being as easy as copy-pasting a link. As we all know, a large volume of information is hosted on archive.today as the result of being a very reliable archival service by Internet standards. It hosts crucial information about the overwhelming majority of the lolcows on the forum. One example being Liz-Fong Jones' infamous consent accident post.

This has led to a lot of the forum being over-reliant on archive.today as an archiver without realizing that it could disappear at any time through any means. There are other archivers such as GhostArchive, but the overwhelming majority of archives are from archive.today exclusively, both from the past and in the present. Because of this subpoena in particular, it's likely that a popular attack vector against various archival services will come in the form of uploading CSAM to the service and reporting it to the FBI, for more information see The Gay Pedophile at the Gates. It's even more likely that it's already a known method to attack currently existing archival services. It's highly probable that David Hoffman is attacking archive.today along all the other websites he's uploaded child pornography to as an attack vector.

Some users have already realized this and use both GhostArchive and archive.today, but I personally think this is still a bottleneck regarding archival. Yes, you have a back up, but what happens if both of these websites were to go down? What if the gay pedophile uploads child porn to the website and gets it taken down? Oh I guess you don't have an archive anymore and the person in question DFE'd and the post has become lost media. Even though something like this is less likely, because of the nature of the forum, users need to be prepared for worst case scenarios like entire lexicons of information regarding lolcows potentially being wiped out over night and or becoming lost media in their entirety.


archivepic.png One Potential Solution to the Issue archivepic.png
There are a number of solutions to this issue. The first and most obvious one would be a program that forum users that works like archive.today, but instead is a front-end for running a webpage through multiple archivers in parallel and outputting all relevant archive links as opposed to just a single archival website. This alleviates one issue which is that information on various lolcows would be hosted on several different web archivers as opposed to just a single archiver. It would likely be a patch-work solution but one that could give us much greater flexibility in regards to archival. Especially if you can choose which archivers it feeds web pages to on the back-end. If need be, user hosted archivers based on ArchiveBox can suffice in the case that multiple archival services goes down.

The above is a solution that solves archival issues in the future and not necessarily the problem of past archival where 98% of information utilizes archive.today. For that, we need another solution entirely.
 
Last edited:
If need be, user hosted archivers based on ArchiveBox can suffice in the case that multiple archival services goes down.
I've been thinking of spinning one of these up for forum users.
Also
1762821645243.png



With regards to setting up an ArchiveBox instance for users, the way I was thinking about doing it was enabling the authentication feature in ArchiveBox and creating a thread where users here can ask for an account which I would then DM the credentials of to. I'm sure it'd be the target of spamming if it wasnt for trusted users only.
 
Last edited:
Megalodon.jp is a pretty good option. Learn which moon rune buttons to click now, before a catastrophe.

The Wayback Machine has censorship/takedown issues, but if the material is very important, you should try to use Archive.today, Ghostarchive, Megalodon.jp, and the Wayback Machine on it.
 
the dude has stated that it was a standard subpoena of a csam incident.
 
If I may drop 2 cents on something that is a little unrelated, I feel like the need to ARCHIVE EVERYTHING isn't encouraged or hammered into enough posters here. It has honestly become a lot more common to see users not archiving anything they post regardless of the scale of importance of the actual content they post is. Hell, it's even such easy shit too that it makes it really frustrating to see. Instead of downloading a YouTube video and posting it here directly, they embed it with the media button or just fucking post the direct link. Instead of taking a couple of seconds out of there day to paste a link to a Reddit thread they're discussing into Archive.is or Ghostarchive, they just post the link directly. It also feels like there's close to 0 people telling the people that do this that They Are Doing It Wrong™ besides the dedicated archivers and effortposters. I do have to confess that I am a hypocrite in not educating people about the importance of archiving so if you want to disregard me on that, feel free to do so.

This is a problem that I think should at least try to be solved. Maybe more aggressive tactics and in-your-face advertising about archiving even in places that really don't need it like the Off-Topic board. Maybe not archiving could be a valid report reason and the jannies link to the dedicated Archival Tools thread on the recipients profile like Null does for un-thumbnailed images. Maybe just anything that doesn't require manual intervention from only the people that actually care.


Or maybe I'm just being retarded and anyone who doesn't archive their shit already knows they're not and don't care and nothing gets solved. Yeah, it's likely that...
 
Megalodon.jp is a pretty good option. Learn which moon rune buttons to click now, before a catastrophe.
Something to bear in mind if that site's hosted in Japan is that since their defamation laws are embarrassingly easy to abuse (hurting someone's reputation in any way can be considered defamation in court, even if what you're saying is demonstrably true), there's a good chance the nips running it are really sensitive to complaints, and if someone emails them saying an archived page is being used as part of a defamation campaign they're probably gonna fold and delete the archive no questions asked. I wouldn't put much confidence in anything drama-related staying up on there permanently.
 
Something to bear in mind if that site's hosted in Japan is that since their defamation laws are embarrassingly easy to abuse (hurting someone's reputation in any way can be considered defamation in court, even if what you're saying is demonstrably true), there's a good chance the nips running it are really sensitive to complaints, and if someone emails them saying an archived page is being used as part of a defamation campaign they're probably gonna fold and delete the archive no questions asked. I wouldn't put much confidence in anything drama-related staying up on there permanently.
Only the most deranged would bother doing that, I assume after they see the link here since it's a lesser known site to begin with.

If it's particularly sensitive content, try to archive it on all of the Big 4, and add screenshots/files. Make them bully/destroy up to five places to get it off the damn Internet.
 
Megalodon.jp is a pretty good option. Learn which moon rune buttons to click now, before a catastrophe.

The Wayback Machine has censorship/takedown issues, but if the material is very important, you should try to use Archive.today, Ghostarchive, Megalodon.jp, and the Wayback Machine on it.
unlike other archivers it'll only exclude/remove a specific URL (example.com/example.html) not a domain (example.com), plus it can support any downloadable file given it's under 10mbs

The top button (in blue) reads "search and [something]" so I guess that means "search and archive", while the bottom button (teal) just reads "search", as far as I can tell.

(suck at kanji)
it's search feature is a joke, and that is the problem to megalodon and why I still love archive.today, it has pages, also if you ever get weird scaling it's due to it archiving in 3K resolution.
 
Last edited by a moderator:
archivebox lists a decemt amount of alternatives here too https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-Community#other-archivebox-alternatives

The problem with things like archivebox is that it doesn't solve archivings biggest problem, the owner "getting hit by a bus". Running a local instance such that registrations are limited to users of a site such as this one doesn't really help in its legitimacy either. Sure it could act as a mirror to the bigger sites but if it was the sole source of truth then the average person would ignore it because the kiwi farms is an evil doxing site and that archive is clearly manipulated chud.

I've toyed with running an instance for maybe the last 3 years but there's a few problems that stop me:
1. liability: i do not want to have to view and moderate illegal material and it's not a question of if this material gets uploaded, it's when.
2. legitimacy: i'm just a dude and why should i be trusted. plus, i will die eventually.
3. cost. the service itself has zero value so you're limited to running it as a passion project in both time and money; i'd prefer if it was one or the other..

I've always thought that something like activitypub would be a good fit for archiving. A bunch of people could run their own archiving instances and they could federate archives between them. It solves a few problems, if one site goes down it's no problem because it's already mirrored on every other site. Backups (sorta) and redundancy are automatically taken care of once the pool of participating sites grows large enough. Moderation should be a bit easier because people are working together - it would just need a public moderation log. Archives would be publicly listed via the feed which I'm not sure is a good or bad thing. If real life gets in the way and you need to remove your instance for whatever reason then it's not a big deal - the exact same archives already exist on other federated sites (though links would break).
It's not perfect though because it doesn't stop one or several people from joining and trying to taint archives by for example, falsely marking them as illegal content. Or defederating with "problematic" instances. Or spinning up 20 instances and trying to gain consensus. Maybe this is something already solved by federation though, two examples being mastodon.social and matrix.org. Two federation projects that suffer the ironic problem in that most users choose to use their instance rather than a federated instance so by default the original instance ends up being the main source of truth.

I didn't mean to sperge but seeing as the point of this thread was about discussing potential solutions, figured it was a decent time to air it out.
 
Where did he say that? I couldn't find anything on his blog or X account.
Even with what Josh said, the site is still heavily vulnerable to having >95% of it's archives being wiped out if archive.is were to go down for whatever reason, and anything to alleviate this problem is a good thing.
 
My biggest concern with a Kiwi run archive service is that cows would accuse us of fakery. That's MUCH harder to do on a third party server.
 
Full-page screenshots would suffice as a fallback in case an archive website goes offline. Yes, they could be accused of being faked but it would be better than having nothing at all.
 
Ultimately, the only good archive is the Kiwifarms server itself. As important as it is having off site archives for secondary verification purposes it is probably necessary to start pushing people to do better about putting relevant info directly on to the farms. That way the only way the stuff gets taken down is if the Farms is taken down. At which point it won't really matter.
 
Idk I've tried to get him to allow me to download copies of the archives but he doesn't seem amenable. I've discussed this before but I have neither the tools nor means to accomplish this at the moment.

I can get a list of every single archive.x archive id but if I'm to bother you need to tell me how you're going to get the data and create a backup. there's some python-based self-hosted stuff I've looked at before but they don't have the means to import >300,000 links.

you can try to download the archives as a WARC (ideally just the interior of the page) but like I said I have no tools for this.
 
Back
Top Bottom