Archival Tools - How to archive anything.

Hellwalker · May 5, 2026

Crossposting from the Archive.today thread:

Post in thread 'archive.today'

May 5, 2026

Is anyone else having severe problems with archiving with archive.today? By severe, I mean can't even fucking archive anything anymore. I don't know if this is because the owner is still having that war with some Finnish retard or whatever drama that I couldn't give less of a shit about but my current problems with it are:
- I constantly get a reCAPTCHA. When opening the site, when clicking any on-site link, when trying to archive a link, anything. It's fucking insufferable.
- I can't even archive shit anymore from my experience. The loading page you get redirected is...

Does anyone else experience issues with Archive.today like I am?

rofllmaoxdwow · May 5, 2026

Hellwalker said:
Crossposting from the Archive.today thread:

Post in thread 'archive.today'

May 5, 2026

Is anyone else having severe problems with archiving with archive.today? By severe, I mean can't even fucking archive anything anymore. I don't know if this is because the owner is still having that war with some Finnish retard or whatever drama that I couldn't give less of a shit about but my current problems with it are:
- I constantly get a reCAPTCHA. When opening the site, when clicking any on-site link, when trying to archive a link, anything. It's fucking insufferable.
- I can't even archive shit anymore from my experience. The loading page you get redirected is...

Hellwalker

Does anyone else experience issues with Archive.today like I am?

- Archive.today has always raped you with captchas. This problem isn't new.
- What are you trying to archive? This problem could be site specific

NGMI · May 5, 2026

Hellwalker said:
Crossposting from the Archive.today thread:

Post in thread 'archive.today'

May 5, 2026

Is anyone else having severe problems with archiving with archive.today? By severe, I mean can't even fucking archive anything anymore. I don't know if this is because the owner is still having that war with some Finnish retard or whatever drama that I couldn't give less of a shit about but my current problems with it are:
- I constantly get a reCAPTCHA. When opening the site, when clicking any on-site link, when trying to archive a link, anything. It's fucking insufferable.
- I can't even archive shit anymore from my experience. The loading page you get redirected is...

Hellwalker

Does anyone else experience issues with Archive.today like I am?

Check your browser settings to see if DNS over HTTPS is enabled—if it is, try disabling it or switching your DNS provider to Cloudflare or Google. If the site is still down, try accessing it through its onion link (archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion), Also, as a quality-of-life tip, you can use a CAPTCHA-solving tool like Buster to make the process smoother.

GizmoG · May 5, 2026

Lolcow Archiver v1.1

GizmoG said:
Lolcow Archiver (1.0)

For those who have been making use of this script, as a heads up, I've noticed that there is a memory leak caused by Playwright, which seemingly is ignored from the developers for years now. So if you have been running the script and making use of the YouTube community post archiving feature, you may of encountered this issue. I have updated the script to replace usage of Playwright with Selenium. I haven't noticed any issues with community post capture or memory leaks with this change yet, so I think it's ready for public consumption.

Python:

'''
Lolcow Archiver 1.1

This script can archive the following:

1) Kick/Twitch/YouTube Live Streams
2) YouTube Community Posts


This script requires yt-dlp to be installed for
video generation, in the same directory as this script.
You can get all the options for downloading yt-dlp
from it's Github page:

https://github.com/yt-dlp/yt-dlp/wiki/Installation


1.1: Replaced Playwright usage with Firefox and Selenium
    due to Playwright memory leaks unaddressed by their devs
  
    python -m pip install selenium

'''

import subprocess
import sys
import time
import os
from pathlib import Path
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options as FirefoxOptions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException, WebDriverException


# ---------------- SETTINGS ----------------

NEW_POST_COUNT = 0
NEW_STREAM_COUNT = 0


# ---------------- HELPERS ----------------

def read_last_seen(last_seen_file):
    if os.path.exists(last_seen_file):
        return Path(last_seen_file).read_text().strip()
    return None


def save_last_seen(last_seen_file, url):
    Path(last_seen_file).write_text(url)


def timestamp():
    return time.strftime("%Y-%m-%d_%H-%M-%S")


def usage():
    print("Usage: python lolcow_archiver.py Kick/Twitch/YouTube_URL Lolcow_Name [YouTube Community Post URL]")
    print("\nFor Example:")
    print("python lolcow_archiver.py https://www.youtube.com/@GlassWindow11/live Glass https://www.youtube.com/@GlassWindow11/posts")
    print("python lolcow_archiver.py https://kick.com/clavicular/ Clavicular")
    print("python lolcow_archiver.py https://kick.com/pwrworld/ PWR https://www.youtube.com/@PWRWorldOfficial/posts")
    print("\n\n")

def check_for_livestream(live_url):
    global NEW_STREAM_COUNT
    script_dir = Path(__file__).resolve().parent
    ytdlp_path = script_dir / "yt-dlp.exe"

    if not ytdlp_path.exists():
        print(time.strftime("%H:%M:%S", time.localtime()), f": Error: yt-dlp.exe not found at {ytdlp_path}")
        sys.exit(1)

    command = [
        str(ytdlp_path),
        "-o",
        "%(title).50B [%(id)s].%(ext)s", # Limit filename length if stream title too long
        "-t",
        "mp4",
        live_url,
    ]

    print(time.strftime("%H:%M:%S", time.localtime()), ": Launching yt-dlp...")
    try:
        result = subprocess.run(command, check=False)
        if result.returncode == 0:
            NEW_STREAM_COUNT += 1
    except Exception as e:
        print(time.strftime("%H:%M:%S", time.localtime()), f": Error while running yt-dlp: {e}")


def make_firefox_driver():
    options = FirefoxOptions()
    options.add_argument("-headless")

    # Do not wait for every YouTube resource to finish loading.
    # Selenium supports page load strategies:
    # normal = complete, eager = interactive, none = return immediately.
    options.page_load_strategy = "eager"

    # Keep YouTube UI text predictable so "Read more" selector works.
    options.set_preference("intl.accept_languages", "en-US,en")

    driver = webdriver.Firefox(options=options)
    driver.set_window_size(1920, 1080)

    # Keep this relatively short because we recover from timeouts below.
    driver.set_page_load_timeout(20)

    return driver

def safe_get(driver, url, label="page"):
    try:
        driver.get(url)
    except TimeoutException:
        print(
            time.strftime("%H:%M:%S", time.localtime()),
            f": Navigation timed out while loading {label}. Continuing with partially loaded page."
        )

        # Stop further loading so Selenium can continue interacting with the DOM.
        try:
            driver.execute_script("window.stop();")
        except WebDriverException:
            pass

def click_read_more_if_present(driver):
    # YouTube can render "Read more" in different custom elements depending on layout.
    candidates = driver.find_elements(
        By.CSS_SELECTOR,
        "tp-yt-paper-button, yt-formatted-string, button, span"
    )

    for element in candidates:
        try:
            text = (element.text or "").strip().lower()
            if text == "read more" or "read more" in text:
                driver.execute_script(
                    "arguments[0].scrollIntoView({block: 'center'});",
                    element
                )
                time.sleep(0.5)

                try:
                    element.click()
                except Exception:
                    driver.execute_script("arguments[0].click();", element)

                time.sleep(2)
                print(
                    time.strftime("%H:%M:%S", time.localtime()),
                    ": Large community post detected. Expanding the full text of the community post."
                )
                return True
        except Exception:
            continue

    return False


def check_for_new_post(lolcow_name, channel_posts_url):
    global NEW_POST_COUNT

    screenshot_prefix = f"{lolcow_name}_community_post_"
    last_seen_file = f"{lolcow_name}_last_seen.txt"

    driver = make_firefox_driver()

    try:
        print("\n")
        print(time.strftime("%H:%M:%S", time.localtime()), f": Checking community posts for {lolcow_name}...")

        # Navigate to Community Posts
        safe_get(driver, channel_posts_url, "YouTube community posts page")
        time.sleep(5)

        # Grab newest post link
        post_links = WebDriverWait(driver, 20).until(
            lambda d: d.find_elements(By.CSS_SELECTOR, "a[href*='/post/'], a[href*='post/']")
        )

        if not post_links:
            print(time.strftime("%H:%M:%S", time.localtime()), ": No posts located.")
            return

        latest_url = post_links[0].get_attribute("href")

        if not latest_url:
            print(time.strftime("%H:%M:%S", time.localtime()), ": No post URL found.")
            return

        if not latest_url.startswith("http"):
            latest_url = "https://www.youtube.com" + latest_url

        print(time.strftime("%H:%M:%S", time.localtime()), ": Latest post found:", latest_url)

        # Compare to last seen
        last_seen = read_last_seen(last_seen_file)

        if latest_url == last_seen:
            print(time.strftime("%H:%M:%S", time.localtime()), ": No new posts detected.")
            return

        print(time.strftime("%H:%M:%S", time.localtime()), ": NEW POST DETECTED!")
        NEW_POST_COUNT += 1

        save_last_seen(last_seen_file, latest_url)

        # Open post page
        safe_get(driver, latest_url, "YouTube community post page")
        time.sleep(4)

        # Expand text if needed
        click_read_more_if_present(driver)

        # Screenshot
        screenshot_name = f"{screenshot_prefix}{timestamp()}.png"
        screenshot_path = str(Path(screenshot_name).resolve())

        ok = driver.get_full_page_screenshot_as_file(screenshot_path)

        if ok:
            print(time.strftime("%H:%M:%S", time.localtime()), ": Screenshot saved:", screenshot_name)
        else:
            print(time.strftime("%H:%M:%S", time.localtime()), ": Screenshot failed:", screenshot_name)

    finally:
        driver.quit()


# ---------------- MAIN ----------------

def main():
    capture_community_posts = False

    if len(sys.argv) < 3:
        usage()
        sys.exit(1)

    if len(sys.argv) == 4:
        capture_community_posts = True
        channel_posts_url = sys.argv[3]

    live_url = sys.argv[1]
    lolcow_name = sys.argv[2]
    time.strftime("%H:%M:%S", time.localtime())
    print("\n")
    print("__________________________________________________")
    print(f"Starting Lolcow Archiver for {lolcow_name}")
    print(f"Checking for livestream at {live_url}")
    if capture_community_posts:
        print (f"Checking for YouTube Community Posts at {channel_posts_url}")
    print("--------------------------------------------------")

    while True:
        try:
            if capture_community_posts:
                check_for_new_post(lolcow_name, channel_posts_url)
            check_for_livestream(live_url)

        except Exception as e:
            print("ERROR:", e)

        if capture_community_posts and NEW_POST_COUNT > 0:
            print("\n")
            print("**************************************************")
            print(f"THERE ARE {NEW_POST_COUNT} NEW COMMUNITY POSTS FOR YOU TO ARCHIVE")
            print("**************************************************")

        if NEW_STREAM_COUNT > 0:
            print("\n")
            print("**************************************************")
            print(f"THERE ARE {NEW_STREAM_COUNT} NEW STREAMS FOR YOU TO ARCHIVE")
            print("**************************************************")

        print("\n")
        print(time.strftime("%H:%M:%S", time.localtime()), ": Sleeping for 30 seconds before checking again...\n")
        time.sleep(30)


if __name__ == "__main__":
    main()

Also, to automatically start up the script for a selection of lolcows, here is an example batch of commands that I would run (the equivalent commands, with substituted path/script naming for my filesystem):

Code:

wt -w 0 nt --title "Kiss Alice" --suppressApplicationTitle -p "Kiss Alice" cmd /k "cd C:\PATH\TO\TEST_archiver-1-1 && python lolcow_archiver_1_1.py https://www.youtube.com/@BaiserAlice/live KissAlice https://www.youtube.com/@BaiserAlice/posts";
wt -w 0 nt --title "Glass" --suppressApplicationTitle -p "Glass" cmd /k "cd C:\PATH\TO\TEST_archiver-1-1 && python lolcow_archiver_1_1.py https://www.youtube.com/@GlassWindow11/live Glass https://www.youtube.com/@GlassWindow11/posts";
wt -w 0 nt --title "UrghBla" --suppressApplicationTitle -p "UrghBla" cmd /k "cd C:\PATH\TO\TEST_archiver-1-1 && python lolcow_archiver_1_1.py https://www.youtube.com/@UrghBla/live UrghBla https://www.youtube.com/@UrghBla/posts";
wt -w 0 nt --title "PWR" --suppressApplicationTitle -p "PWR" cmd /k "cd C:\PATH\TO\TEST_archiver-1-1 && python lolcow_archiver_1_1.py https://kick.com/pwrworld/ PWR https://www.youtube.com/@PWRWorldOfficial/posts";
echo DONE;

Resulting in following command line window with tabs for each script executing per lolcow being monitored:

Hellwalker · May 7, 2026

Late reply.

rofllmaoxdwow said:
- Archive.today has always raped you with captchas. This problem isn't new.

Since when? Ever since I have used the service, I have only gotten a reCAPTCHA when trying trying to archive a webpage, nothing else.

rofllmaoxdwow said:
- What are you trying to archive? This problem could be site specific

I don't recall any specifics but I think some things I've tried to archive were:
- KF homepage
- old.reddit.com thread
- An account on Xitter
I already know that the successful archival of 2 of these (KF and Xitter) vary wildly but the bottom line is that I was at least able to get some progress going which hasn't been the case for a bit.

NGMI said:
Check your browser settings to see if DNS over HTTPS is enabled—if it is, try disabling it or switching your DNS provider to Cloudflare or Google. If the site is still down, try accessing it through its onion link (archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion), Also, as a quality-of-life tip, you can use a CAPTCHA-solving tool like Buster to make the process smoother.

The last time I was able to test, I actually found out that I was having a fair bit of issues with DNS-over-HTTPS using Mullvad's default url (https://dns.mullvad.net/dns-query) with websites for some reason not loading the first time or at all even though I've used it since forever and never had problems with it. I've switched to Cloudflare's for the time being but it didn't seem to make a difference for archive.today.

I'll try again when I'm in a better position to do so and try out the .onion link.

rofllmaoxdwow · May 7, 2026

Hellwalker said:
The last time I was able to test, I actually found out that I was having a fair bit of issues with DNS-over-HTTPS using Mullvad's default url (https://dns.mullvad.net/dns-query) with websites for some reason not loading the first time or at all even though I've used it since forever and never had problems with it. I've switched to Cloudflare's for the time being but it didn't seem to make a difference for archive.today.

I'll try again when I'm in a better position to do so and try out the .onion link.

Cloudflare and archive.today had a weird years long DNS issue preventing 1.1.1.1 users from accessing archive.today. Both parties blamed each other. This isn't mentioned on the wikipedia article because some random faggot decided the issue "seems to be fixed" (good for you retarded wikipedo) and no one has bothered to revert that edit.
I don't use Cloudflare's DNS when connecting to archive.today for this reason.

Free Dick Pills · May 13, 2026

I feel like the OP needs an update/refresher since yt-dl went through that whole DMCA arc a few years ago and came out victorious the other end as yt-dlp. If @Osama Bands Laden is still on his OP rewrite bender that'd be an important one to tackle

Also there are now mobile apps (android) available that leverage yt-dlp and ffmpeg, which may help pointing the type of people that go for the phonefag excuse, towards the light, when it comes to archiving.

Notable example:
https://github.com/deniscerri/ytdlnis

Given how OP was born in 2014, mobilefagging was a limited thing when it came to browsing forums. But the times have been a-changin' and xenforo-sneedfork works reasonably well with the mobile UX in mind. I'd dare to say that a lot of changes made to the forums were designed with phonefags in mind.

Might be worth to put some energy back into the archivists' spirit of this sites underbelly.

@Null

edit // addendum:
Also reCAPTCHA is going to be _really_ fucking bad going forward, which is relevant with archival services such as archive.ph/.today.
They also ran into a controversy of tampering with archives some time back.
Just something to consider.

vibecoder43 · May 18, 2026

I want to shill my open-source project that's relevant to this thread. It's a web archiving service, like archive.today but you can self-host it. There's a working instance at https://webshotdemo.name and the repo is https://github.com/vibecoder43/webshot/. The instance is currently in allowlist mode, allowing only en.wikipedia.org.

Archival Tools - How to archive anything.

Hellwalker

フライミートゥーザム

Post in thread 'archive.today'

rofllmaoxdwow

Post in thread 'archive.today'

NGMI

Post in thread 'archive.today'

GizmoG

“Comfy, consent, and creampie”

Lolcow Archiver v1.1

Hellwalker

フライミートゥーザム

rofllmaoxdwow

Free Dick Pills

C L I C K H E R E

vibecoder43

Archival Tools - How to archive anything.

フライミートゥーザム

“Comfy, consent, and creampie”

Lolcow Archiver v1.1​

フライミートゥーザム

** C L I C K H E R E **

Lolcow Archiver v1.1

C L I C K H E R E