Archival Tools - How to archive anything.

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
I haven't seen any signs of malware yet, but will be doing a scan.

On clearnet, I get straight to the main archive.ph page with no CAPTCHA, and can search and archive with no issues.

I wonder if something is wrong with the CAPTCHA redirect, then. In chat, @Gog & Magog said he could not get past the CAPTCHA while on Tor, if I understood him right.

EDIT: No scan results yet, but I get the same behavior on my phone. Clearnet: no CAPTCHA, no redirect to rurtnews.com. TOR: CAPTCHA, which redirects to rurtnews.com
I doubt this is the doing of malware. Now why would a virus redirect affect only one website on one browser?
 
I have not tested this from the clear net nor another VPN.
I've got the same problem. I've tested with two different devices, using two different connections, through two different VPN companies. When you get to the CAPTCHA screen, after it sits five seconds it redirects to the https://rurtnews.com/ site.
 
The RT re-direct happened for the 1st time tonight. I closed the tab and retried, then everything was normal. It happened 1x per ~5 archives I made in that one session.
 
  • Like
Reactions: clipartfan92
Is it an official RT domain? That would normally be rt.com.

Lol:
footfetish.png
 
Hooked a browser into zaproxy and did some looking into it, kinda does look like they've been hijacked going by what their reverse proxy is returning in the response requests across multiple TLDs:
1743055502696.png

The captcha doesn't even get a chance to be loaded or completed before it redirects over to rurtnews. Additionally, my captchas are showing up in fucking squiggle language of all things, despite me using a VPN subnet nowhere near any middle eastern geolocation:
1743055671382.png
 
If it also happens on your phone then it's possibly an issue with your network or router.
Normally I would agree with you. I should have mentioned that I deliberately left my house and made sure my phone was on cellular data with wifi disabled for that test, specifically to determine whether it was my network. I see this is affecting you as well now.

LOL at archiving degeneracy being interrupted by CAPTCHA redirects. Who knew Russians liked feet that much?

I don't know if that's an official domain for Russia Today or not. While the IP address is the same, the name servers are different. rt.com's DNS records are hosted by rttv.ru, whereas rurtnews.com's DNS records are hosted by megafon.ru. Of note is that runewsrt.com also points to the same IP and is also hosted by megafon.ru, and was registered 7 days ago. Perhaps that means nothing and is only a coincidence, but I found it interesting.

1743058043746.png1743058323852.png1743058356936.png


Hooked a browser into zaproxy and did some looking into it, kinda does look like they've been hijacked going by what their reverse proxy is returning in the response requests across multiple TLDs:
I should have thought to check for that. I see that now too, looking at the response headers in Tor Browser's console.

Although this has been going for over 6 hours how, I sent an email to the Archive Today webmaster in case the weren't aware of what's going on.

@Null, is this worth a feature or some kind of notice/warning header, or should I dig into this more and start another thread so I can stop shitting up this one?
 
On clearnet, I get straight to the main archive.ph page with no CAPTCHA, and can search and archive with no issues.
I got exactly this redirect a couple hours ago, which was a little disconcerting considering it's recaptcha. It's not doing that now, though, although I'm now getting the "bad goy" kind of captcha where it gives you eight shitty blurry challenges in a row before letting you in.

Reddit also mentions this: https://www.reddit.com/r/DataHoarde...ivetoday_redirecting_to_a_weird_russian_news/

And of course the top rated comment is absolute retardation thinking the OP was talking about the Wayback Machine instead.
 
I got exactly this redirect a couple hours ago, which was a little disconcerting considering it's recaptcha. It's not doing that now, though, although I'm now getting the "bad goy" kind of captcha where it gives you eight shitty blurry challenges in a row before letting you in.

Reddit also mentions this: https://www.reddit.com/r/DataHoarde...ivetoday_redirecting_to_a_weird_russian_news/

And of course the top rated comment is absolute retardation thinking the OP was talking about the Wayback Machine instead.
The webmaster for Archive Today replied to my email right before I posted in I&T. It's a bug on his part.

LOL at a reddit nigger confusing Archive Today with the Wayback Machine.
 
Anyone have any tips on archiving Vimeo video pages? I can download videos from it just fine, but when I try and archive a video link for stuff like upload date and description, it gets cucked by Cloudflare on both archive.today and ghostarchive.
 
  • Thunk-Provoking
Reactions: I'm a Silly
Has anyone else had any issues using GhostArchive to archive stuff recently? I can view stuff just fine, but I want to archive some X/Twitter chains and GhostArchive kept giving me errors. When I tried it, I tried switching VPNs but it still kept giving me errors. It also doesn't seem to be an X/Twitter issue, as I tried to archive some random website to test and it still didn't work.

I can still use archive.md to archive just fine, it's just one of the chains I want to archive is like 12 Tweets and I don't really want to archive each one individually.
 
Has anyone else had any issues using GhostArchive to archive stuff recently?
Yes, very recently I am getting:

Archiving error​

There was an issue trying to archive your webpage or video. Usually, webpages that are bigger than 50 megabytes, or videos longer than 15 minutes, may fail to archive.
But the unusual part is that this error is appearing nearly instantly instead of after e.g. a minute like it used to, which may indicate something is up behind the scenes.

For tweets you can try replacing the account name with "i" to grab more context (I do this from the Archive.today page instead of using the bookmarklet I'm clicking all day every day). Or maybe one of the Nitter instances still works.
 
Did Archive.today stop using "run=1"? That's in the bookmarklet to make it automatically start archiving the URL. Now it opens the page with the URL pre-filled but I have to submit the form.
It's not the worst thing in the world because I can check the URL for cruft, replace X account name with "i", etc.
 
  • Like
Reactions: I'm a Silly
How do I archive an entire twitter account?
1st option offline-twitter https://offline-twitter.com/
simple and clean and good for everyday single use. But when I tried to archive an account it didn't go back all the way to the first tweets by a few months so there might be limitations and I have no clue about the progress of the development. Also it's a bit late to getting same day tweets unless you liked it. It's local host web ui that looks this. Site has instructions for install and starting. Not open source afaik so uh probably not malicious
1745297008825.webp
2nd option gallery-dl
full proof but not pretty
Using this reddit set up. link [A]
I do prefer "filename": "twitter_{tweet_id}_{author[name]}_{num}.{extension}", instead because by sorting by name I sort by tweet id which sorts it by date.
Don't remember the options and defaults but you can look them up here. https://gdl-org.github.io/docs/configuration.html ctrl-f twitter
Then my commands look like this because it's used for a current active account.
gallery-dl --directory ".\Twitter\test1" "https://twitter.com/search?q=from:test1&f=live" --write-metadata --abort 5
gallery-dl --directory ".\Twitter media\test1 images" "https://twitter.com/test1/media" --write-metadata --abort 5
gallery-dl --directory ".\#test1" "https://twitter.com/hashtag/test1&f=live" --write-metadata --abort 5
Then ask chatgpt to make a program to convert it to a pretty version. I had some luck a year and a half ago asking it to make a python program to make it into a html file that looks like twitter. I didn't quite like the result so I deleted it. Last attempt was a year ago so it probably got better.
Not sure if I set it up the best way so maybe the reddit one is better.
3rd option wait for nitter to add it. It's on their roadmap but it's been there for a while.
 
https://kiwifarms.st/threads/hasan-piker-hasanabi.95834/post-21254482

If you didn't know, you can archive individual images with Archive.today. This can be useful when the highest quality version of an image is sitting on the server but not used on an HTML page.

For the New York Times, inspecting the page at the image shows a source set with additional URLs, one of which appears to be the "master" copy. So I archived that. On the archive page, you should be getting the highest quality original with no webp compression, so you can right click and save it, or open in a new tab to zoom in easier.

When it comes to archiving individual files with Archive.today, the one that has given me the most trouble is probably PDFs. Archiving that produces a snapshot of the first page and no more AFAICT. Yeah, here's an example someone did from NYT. Ghostarchive has better handling of PDFs if it doesn't reject it based on the size.
 
Last edited:
Back