Archival Tools - How to archive anything.

AnOminous · Feb 3, 2024

The Mass Shooter Ron Soye said:
Google has removed cache links from search results. They aren't removing access to Google Cache yet, but they might in the future.

If there were a new Hitler who promised to gas all Google personnel I'd vote for that fucker.

no-exit · Feb 26, 2024

I'm not experienced or autistic enough to intuit FFMPEG commands, but I found this FFMPEG GUI that works nicely. My favorite feature is that it can "View FFmpeg last command log so you can modify and re-run recent commands on the command line".
lossless-cut

Geranium · Mar 1, 2024

Since Nitter died, and Twitter has prevented Archive.today and Ghostarchive from properly saving threads, I've started using Thread Reader. These pages archive well, which is particularly important because Twitter's API rules require Thread Reader to remove saved threads under certain circumstances (eg the author's Twitter account is suspended).

Here's an example:
Thread by @ErinInTheMorn on Thread Reader App – Thread Reader App (archive; archive via Tor)

I've also written a little script to quote the thread. It handles links properly, but not images. It does handle quoted tweets, but only by including a link to the tweet, and only if you prevent Twitter's embed iframe from loading via uBlock or similar.

JavaScript:

(function (document) {
  function findTweets(document) {
    // Tweets that are part of the thread have IDs like tweet_1
    return Array.from(document.querySelectorAll("div[id^=tweet_]"));
  }

  function tweetToText(tweet) {
    const TEXT_NODE = 3;
    let text = Array.from(tweet.childNodes)
      // Keep only text nodes and spans containing tweet numbers.
      .filter(
        (n) =>
          n.nodeType === TEXT_NODE ||
          (n.nodeName === "SPAN" && n.classList.contains("nop")),
      )
      // Get text of each node without leading or trailing spaces.
      .map((n) => n.textContent.trim())
      // Restore space after leading tweet numbers.
      .map((s) => (/^\d+\.$/.test(s) ? s + " " : s))
      // Restore single line breaks.
      .map((n) => (n === "" ? "\n" : n))
      .join("")
      // Replace runs of line breaks with two breaks to separate paragraphs.
      .replaceAll(/\n+/g, "\n\n")
      // Ensure no leading or trailing spaces.
      .trim();

    // Append any links in the tweet at the end.
    let urls = getTweetLinks(tweet);
    if (urls !== "") {
      text += "\n\n" + urls;
    }

    // Append link to quoted tweet at the end.
    let quoted_tweet = getQuotedTweetUrl(tweet);
    if (quoted_tweet !== "") {
      text += "\n\n" + quoted_tweet;
    }

    return text;
  }

  function getTweetLinks(tweet) {
    return Array.from(tweet.querySelectorAll("a.entity-url"))
      .map(({ href }) => `[URL="${href}"]${href}[/URL]`)
      .join("\n");
  }

  function getQuotedTweetUrl(tweet) {
    let url = tweet.querySelector(
      '.entity-embed a[href^="https://twitter.com/"]',
    )?.href;
    if (url !== undefined) {
      return `[URL=${url}]${url}[/URL]`;
    } else {
      return "";
    }
  }

  function getThreadDetails(document) {
    let header = document.querySelector(".twitter_name").parentNode.parentNode;
    let name = header.querySelector(".twitter_name").textContent.trim();
    let account = header.querySelector(".screenName").textContent.trim();
    let url = header.querySelector('a[href^="https://twitter.com/"]').href;
    // Tweet post time is a Unix timestamp (seconds since the epoch).
    let unixTime = header.querySelector("[data-time]").dataset.time;
    // JavaScript constructs dates from milliseconds since the epoch.
    let d = new Date(parseInt(unixTime) * 1000);
    let date = d.toLocaleString("en-GB", {
      weekday: "long",
      month: "long",
      day: "numeric",
      year: "numeric",
      hour: "2-digit",
      minute: "2-digit",
      second: "2-digit",
      hour12: false,
    });
    return `${name} (${account}) posted on ${date} UTC\n[URL=${url}]${url}[/URL]`;
  }

  let result = "[QUOTE]\n";
  result += getThreadDetails(document) + "\n\n";
  result += findTweets(document).map(tweetToText).join("\n\n");
  result += "\n[/QUOTE]";
  return result;
})(document);

As written, it just returns the quoted text (as BBCode ready to be pasted here), which is useful if you paste it into the browser's JavaScript console, or use something like Keyboard Maestro to execute JavaScript in your browser and copy the result to your clipboard.

You can have it pop up a new window or tab with the quote contents if you insert the following two lines before "return result" in the code above.

JavaScript:

let nd = window.open("", "Quoted thread");
nd.document.write(result);

This would be most useful as a bookmarklet (a link in your bookmarks bar that runs the code rather than opening another website). Copy the code and use something such as this to create the bookmarklet that you can drag into your bookmark bar.

Here's an example, using the thread linked above:

Erin Reed (@ErinInTheMorn) posted on Wednesday, 28 February 2024 at 21:11:34 UTC
https://twitter.com/ErinInTheMorn/status/1762948681366466903

1. The world's largest psychological organization has released a groundbreaking policy resolution supporting trans youth care.

The policy passed 153-9 by elected psych leaders representing 157,000 members.

It rebuts many anti-trans talking points.

Subscribe to support my work!

2. The American Psychological Organization, representing 157,000 members, represents psychologists in America and internationally.

They make it clear. It is a "necessity" for transgender youth to have access to care.

Read my article here:

https://www.erininthemorning.com/p/worlds-largest-psych-association

3. This policy document is stunning and declares:- Gender affirming care is medically necessary- Opposing bans on care for trans youth and adults- Dysphoria is not "caused" by autism or PTSD- False info needs to be combatted- Rejection risks negative mental health, suicide

4. This is the first time, as far as I could find, that the organization directly and emphatically endorsed gender affirming medical care as necessary for trans youth, though previous statements suggested it.

See this 2021 resolution, for instance:

https://www.apa.org/about/policy/resolution-opposing-discriminatory-laws.pdf

5. The policy emerges amid an international debate on gender affirming care and directly counters false information put out by people like Pamela Paul in the New York Times that suggest neurodivergence and OCD cause gender dysphoria.

6. The policy passing by such a massive margin - 94% to 6% - also directly contradicts the idea that there is no consensus around trans care - an idea that has been pushed by groups like SEGM:

https://twitter.com/segm_ebm/status/1501862399980179456

7. Though the policy document may not move Republican legislators, it will be useful in court fights when the findings of fact emerge. Similar findings have been used to overturn a gender affirming care ban in Arkansas and Montana.

https://www.erininthemorning.com/p/arkansas-trans-ban-unconstitutional

8. Lastly, I am a transgender journalist covering legislation, news, and cultural moments on this topic every day.

If you find my work important, please subscribe to support my work at .

http://www.erininthemorning.com/subscribe

Minor edit: The 2nd tweet should say the American Psychological Association, as it does in the subhead of the first post and the article. I meant to type "the psychological organization representing" but that's what I get for tweeting fast.

Geranium · Mar 9, 2024

$44 billion worth of engineering, everyone:

Since Thread Reader doesn't handle two retards bickering on Twitter (only single-author threads), I had another go at my Twitter-quoting script:

JavaScript:

function tweets(document) {
  return [...document.querySelectorAll("article[data-testid=tweet]")];
}

function isQuote(tweet) {
  return [...tweet.querySelectorAll("time")].length == 2;
}

function names(tweet) {
  return [...tweet.querySelectorAll("[data-testid=User-Name]")].map((e) =>
    e.textContent.replace("@", " @").replace(/·.+$/, ""),
  );
}

function tweetTime(tweet) {
  return tweet.querySelector("a time").dateTime;
}

function tweetPermalink(tweet) {
  return tweet.querySelector("a > time").parentNode.href;
}

function qtTime(tweet) {
  return tweet.querySelector("[data-testid=User-Name] div > time").dateTime;
}

function tweetText(tweet) {
  return tweet.querySelector("[data-testid=tweetText][id]").textContent;
}

function qtText(tweet) {
  return tweet.querySelector("[data-testid=tweetText]:not([id])").textContent;
}

function tweetInlineLinks(tweet) {
  let textElement = tweet.querySelector("[data-testid=tweetText][id]");
  let anchors = [...textElement.querySelectorAll("a")];
  return anchors.map((a) => ({
    visible: a.textContent,
    url: a.href,
  }));
}

function tweetCardLink(tweet) {
  let el = tweet.querySelector(
    "[data-testid*='card.layout'] a[role=link][aria-label]",
  );
  if (el === null) {
    return null;
  }
  let href = el.href;
  let text = el.ariaLabel ?? el.textContent ?? "Link";
  return `[URL="${href}"]${text}[/URL]`;
}

function formatIsoString(iso) {
  return (
    new Date(Date.parse(iso)).toLocaleString("en-GB", {
      weekday: "long",
      month: "long",
      day: "numeric",
      year: "numeric",
      hour: "2-digit",
      minute: "2-digit",
      second: "2-digit",
      hour12: false,
    }) + " UTC"
  );
}

function format_tweet(name, text, time, permalink) {
  let timeString = formatIsoString(time);
  if (permalink !== undefined) {
    timeString = `[URL="${permalink}"]${timeString}[/URL]`;
  }
  return `[B]${name} posted on ${timeString}[/B]\n\n${text}`;
}

function textWithLinks(tweet) {
  let text = tweetText(tweet);
  let inlineLinks = tweetInlineLinks(tweet);
  for ({ visible, url } of inlineLinks) {
    let bbcode = `[URL="${url}"]${visible}[/URL]`;
    text = text.replaceAll(visible, bbcode);
  }
  let cardLink = tweetCardLink(tweet);
  if (cardLink !== null) {
    text += `\n\n${cardLink}`;
  }
  return text;
}

function single(tweet) {
  let [author, q_author] = names(tweet);
  let text = textWithLinks(tweet);
  let time = tweetTime(tweet);
  let permalink = tweetPermalink(tweet);
  let main = format_tweet(author, text, time, permalink);
  if (isQuote(tweet)) {
    let q_text = qtText(tweet);
    let q_time = qtTime(tweet);
    let quoteTweetFormatted = format_tweet(q_author, q_text, q_time);
    return `${main}\n[QUOTE]\n${quoteTweetFormatted}\n[/QUOTE]`;
  } else {
    return main;
  }
}

function all(document) {
  return "[QUOTE]\n" + tweets(document).map(single).join("\n\n") + "\n[/QUOTE]";
}

all(document)

Example output:

Benjamin Ryan @benryanwriter posted on Saturday, 9 March 2024 at 02:45:12 UTC

A person asked me why it should be considered exceptional or problematic that the research about pediatric gender-transition treatment is no better than low quality when so many other common treatments get by with only low quality evidence quality backing them. I responded:

Benjamin Ryan @benryanwriter posted on Saturday, 9 March 2024 at 02:43:02 UTC

I asked the @segm_ebm conference panelists this question in October and they become so animated they practically burst out of their seats. They said that the risks and downsides of pediatric pubertal suppression and cross sex hormones are so high, namely the prevention of…

Click to expand...

Erin Reed @ErinInTheMorn posted on Saturday, 9 March 2024 at 03:34:03 UTC

You're only balancing one side of the risk equation. Not doing anything is not a neutral action. The other side of that equation is developing severe anxiety disorder, major depression, suicidality, and trauma.

As someone who suffered from that, I wouldn't wish it on an enemy.

Benjamin Ryan @benryanwriter posted on Saturday, 9 March 2024 at 03:38:39 UTC

Five systematic literature reviews have asked: "What are the benefits of treating pediatric gender dysphoria with puberty blockers and cross-sex hormones?"

They have found the evidence inconclusive.

Your claims are thus not supported by evidence.

Benjamin Ryan @benryanwriter posted on Wednesday, 6 March 2024 at 18:35:13 UTC

Below are the five systematic literature reviews of pediatric gender-transition treatment that between them have all found that the related evidence is of no greater than low quality. The results are inconclusive.

Consequently, an increasing number of European nations have been…

Click to expand...

Erin Reed @ErinInTheMorn posted on Saturday, 9 March 2024 at 03:39:33 UTC

You're just repeating GRADE analyses again, so we're going to talk each other in circles. Still, for others:

The Myth Of "Low Quality Evidence" Around Transgender Care

Benjamin Ryan @benryanwriter posted on Saturday, 9 March 2024 at 03:43:59 UTC

The conclusion of the most recent systematic review: "Current evidence does not clearly suggest that [gender dysphoria] symptoms and mental health significantly improve when [puberty blockers and cross-sex hormones] are given to minors with GD."

Benjamin Ryan @benryanwriter posted on Tuesday, 27 February 2024 at 14:11:34 UTC

A new systematic literature review, by Germans, of puberty blockers (PB) and cross-sex hormones (CSH) to treat gender dysphoria (GD) in kids concludes: "Current evidence does not clearly suggest that GD symptoms and mental health significantly improve when PB or CSH are given to…

Click to expand...

Erin Reed @ErinInTheMorn posted on Saturday, 9 March 2024 at 03:50:22 UTC

They literally come to that conclusion through GRADE analysis.

Also, not long after their cutoff, we got a new Puberty Blocker study.

Doesn't matter though, it's not an RCT, which you'll never get ethically or practically.

Erin Reed @ErinInTheMorn posted on Saturday, 16 December 2023 at 21:55:40 UTC

Incredible new transgender study was just published in the prestigious Journal of Adolescent Health.

The study finds that puberty blockers significantly reduce anxiety, depression, and suicidal thoughts compared to those who did not receive blockers.

They are lifesaving.

Click to expand...

Benjamin Ryan @benryanwriter posted on Saturday, 9 March 2024 at 03:52:24 UTC

Yes, and although you dismiss GRADE out of hand, scientists take it seriously, which is why the conclusions of the systematic reviews are valid.

Erin Reed @ErinInTheMorn posted on Saturday, 9 March 2024 at 03:54:54 UTC

Like I said, we'll talk each other in circles.

There's a reason the worlds biggest psychological organization, the most influential medical organizations, all reject your logic.

The Myth Of "Low Quality Evidence" Around Transgender Care

Same problems as mentioned previously about getting the full conversation, which is Twitter's fault due to their very aggressive lazy-loading & unloading. Links included in tweets ("cards") aren't resolved from the shortened t.co links, as they can't be from in-page JavaScript (due to the same-origin policy).

Edit: Amended the code to handle links in the text of the tweet, and to fix repetition in the text of "card" links.

Also, here's an example of how to resolve t.co links in a shell script.

Bash:

cat <<< "https://t.co/ZJfSbRzdUr" | xargs curl -I | awk '/^location:/ { print $2 }'

You just need to get the location header from the 30x response, and not follow the redirect itself, and replace occurrences of the t.co links with the url from the location header.

clipartfan92 · Mar 12, 2024

An interesting idea:
Towards “deep fake” web archives? Trying to forge WARC files using ChatGPT. (a)

Geranium · Mar 13, 2024

Geranium said:
I had another go at my Twitter-quoting script

Fixed a bug where tweets with no text of their own (eg, that just contain a quoted tweet) would cause the script to fail:

JavaScript:

function tweets(document) {
  return [...document.querySelectorAll("article[data-testid=tweet]")];
}

function isQuote(tweet) {
  return [...tweet.querySelectorAll("time")].length == 2;
}

function names(tweet) {
  return [...tweet.querySelectorAll("[data-testid=User-Name]")].map((e) =>
    e.textContent.replace("@", " @").replace(/·.+$/, ""),
  );
}

function tweetTime(tweet) {
  return tweet.querySelector("a time").dateTime;
}

function tweetPermalink(tweet) {
  return tweet.querySelector("a > time").parentNode.href;
}

function qtTime(tweet) {
  return tweet.querySelector("[data-testid=User-Name] div > time").dateTime;
}

function tweetText(tweet) {
  return (
    tweet.querySelector("[data-testid=tweetText][id]")?.textContent ??
    "[no text in tweet]"
  );
}

function qtText(tweet) {
  return tweet.querySelector("[data-testid=tweetText]:not([id])").textContent;
}

function tweetInlineLinks(tweet) {
  let anchorNodes =
    tweet.querySelectorAll("[data-testid=tweetText][id] a") ?? [];
  return [...anchorNodes].map((a) => ({
    visible: a.textContent,
    url: a.href,
  }));
}

function tweetCardLink(tweet) {
  let el = tweet.querySelector(
    "[data-testid*='card.layout'] a[role=link][aria-label]",
  );
  if (el === null) {
    return null;
  }
  let href = el.href;
  let text = el.ariaLabel ?? el.textContent ?? "Link";
  return `[URL="${href}"]${text}[/URL]`;
}

function formatIsoString(iso) {
  return (
    new Date(Date.parse(iso)).toLocaleString("en-GB", {
      weekday: "long",
      month: "long",
      day: "numeric",
      year: "numeric",
      hour: "2-digit",
      minute: "2-digit",
      second: "2-digit",
      hour12: false,
    }) + " UTC"
  );
}

function format_tweet(name, text, time, permalink) {
  let timeString = formatIsoString(time);
  if (permalink !== undefined) {
    timeString = `[URL="${permalink}"]${timeString}[/URL]`;
  }
  return `[B]${name} posted on ${timeString}[/B]\n\n${text}`;
}

function textWithLinks(tweet) {
  let text = tweetText(tweet);
  let inlineLinks = tweetInlineLinks(tweet);
  for ({ visible, url } of inlineLinks) {
    let bbcode = `[URL="${url}"]${visible}[/URL]`;
    text = text.replaceAll(visible, bbcode);
  }
  let cardLink = tweetCardLink(tweet);
  if (cardLink !== null) {
    text += `\n\n${cardLink}`;
  }
  return text;
}

function single(tweet) {
  let [author, q_author] = names(tweet);
  let text = textWithLinks(tweet);
  let time = tweetTime(tweet);
  let permalink = tweetPermalink(tweet);
  let main = format_tweet(author, text, time, permalink);
  if (isQuote(tweet)) {
    let q_text = qtText(tweet);
    let q_time = qtTime(tweet);
    let quoteTweetFormatted = format_tweet(q_author, q_text, q_time);
    return `${main}\n[QUOTE]\n${quoteTweetFormatted}\n[/QUOTE]`;
  } else {
    return main;
  }
}

function all(document) {
  return "[QUOTE]\n" + tweets(document).map(single).join("\n\n") + "\n[/QUOTE]";
}

all(document);

Jerry's Magic Ring · Mar 24, 2024

Is there a way to upload an archive on the Internet of stories presented on Fanfiction.net or at least somewhere where it can be preserved as a public record?

I have tried archive.md, Ghost Archive, and even the pozzed Wayback Machine, but every time I tried doing that, it will just show the page of CloudFlare asking you to verify you aren't a bot in order to access the site. That means that page doesn't get archived.

I know that FicHub and Fanfiction Downloader exists in order to create your personal archive of stories you download from that website and others. What I am concerned about is, let's say I decide to mock a fanfic for being embarrassing. The author then decides to go the delete fucking everything path for the story. Since the story wasn't on A03, which still allows Internet archiving, that means the story disappears off the Internet. So even if you decide to save a personal copy to read and decide to upload it on the Internet later on on a burner A03 account, the author can then claim plausible deniability and say "ackshually, they were lying because I never wrote it. SEE, SEE, it ain't on my page ~~anymore~~".

GenociderSyo · Mar 24, 2024

I use calibre with fanficfare to archive fanfiction. It archives the metadata as well.

Azdy · Apr 22, 2024

Ghostarchive refuses to archive YouTube videos under a minute and Preservetube is also broken, today.

Markass the Worst · Apr 22, 2024

azdadu said:
Ghostarchive refuses to archive YouTube videos under a minute and Preservetube is also broken, today.

Both those things are sometimes broken but they get fixed eventually. Preservetube in particular has an issue where it will archive but it will keep saying archive not found until eventually it appears.

Edit: Maybe not, today it seems to get stuck on cryptographic challenge.

Fag Alert Major Fag Alert · Apr 22, 2024

Markass the Worst said:
Both those things are sometimes broken but they get fixed eventually. Preservetube in particular has an issue where it will archive but it will keep saying archive not found until eventually it appears.

Edit: Maybe not, today it seems to get stuck on cryptographic challenge.

And what is up with archive.is not archiving twitter tweets like it was supposed to be made doing? Is the site broken?

Peaches the Wonder Cat · Apr 22, 2024

Fag Alert Major Fag Alert said:
And what is up with archive.is not archiving twitter tweets like it was supposed to be made doing? Is the site broken?

What I did yesterday was use a nitter instance/ThreadReader, and then point archive.today to that. That way tweets from suspended accounts are still preserved.

Markass the Worst · Apr 22, 2024

Fag Alert Major Fag Alert said:
And what is up with archive.is not archiving twitter tweets like it was supposed to be made doing? Is the site broken?

It's Elon Musk and whatever bullshit anti-bot measure he employs this week.

Peaches the Wonder Cat said:
What I did yesterday was use a nitter instance/ThreadReader, and then point archive.today to that. That way tweets from suspended accounts are still preserved.

That's what I do too, I use nitter.poast.org.

The Mass Shooter Ron Soye · Apr 23, 2024

I've been getting generally sluggish performance out of Archive.today for the last couple of days. Not broken or Nginx page like I've had happen before, but a spinning loading wheel and slow page loads. Me or them?

Markass the Worst · Apr 23, 2024

The Mass Shooter Ron Soye said:
I've been getting generally sluggish performance out of Archive.today for the last couple of days. Not broken or Nginx page like I've had happen before, but a spinning loading wheel and slow page loads. Me or them?

I'm getting that too. My guess is it being some sort of anti DDOS measure.

Hrafn · Apr 24, 2024

I've been getting Not Found (yet?) on archive.is/archive.ph when I try to archive tweets. As for ghost archive, I've been getting "archiving error" every time.

Not sure if something's up on their ends or if xitter has integrated something new that stops these sites from doing their thing.

For the now, make sure to cap everything until these services are running I guess.
Edit - nitter.poast will not work on any of these archive sites either.

Volkamer · Apr 28, 2024

The recent news about Spy.Pet caught my interest but, of course, it got shut down and the only other site I found wants payment in the form of crypto. Hard pass.
My plans to look up two of my personal lolcows died with Spy.Pet.
But surely Spy.Pet isn't the only place that had some crazy amount of scraper bots infiltrating some crazy amount of discord servers, right?
I know shitall about scraping so creating my own army of bots to invade servers is out of the question. I was so ready to pay five bucks to get my hands on discord chatlogs of at least one of my personal lolcows.

Flatline · Apr 28, 2024

Volkamer said:
My plans to look up two of my personal lolcows died with Spy.Pet.

If your lolcows lingered in all private servers, the spy.pet data wouldn't have been useful. The site's bots collected most of its data from widely public servers.

Volkamer · Apr 28, 2024

Flatline said:
If your lolcows lingered in all private servers, the spy.pet data wouldn't have been useful. The site's bots most collected most of its data from widely public servers.

They dabbled in some public servers from what I can see in screenshots provided by former friends. I can't identify the servers just by the icons alone, unfortunately.

Fag Alert Major Fag Alert · Apr 28, 2024

Hrafn said:
I've been getting Not Found (yet?) on archive.is/archive.ph when I try to archive tweets. As for ghost archive, I've been getting "archiving error" every time.

Not sure if something's up on their ends or if xitter has integrated something new that stops these sites from doing their thing.

For the now, make sure to cap everything until these services are running I guess.
Edit - nitter.poast will not work on any of these archive sites either.

I feel like archive.is is mega dead by this point.

Archival Tools - How to archive anything.

AnOminous

SOMEBODY SET UP US THE BOMB

no-exit

Geranium

Kincora; Dutroux; Epstein

Geranium

Kincora; Dutroux; Epstein

clipartfan92

Award Winning

Geranium

Kincora; Dutroux; Epstein

Jerry's Magic Ring

GenociderSyo

Syo

Azdy

Tick, tock..

Markass the Worst

don't do stance, kids

Fag Alert Major Fag Alert

Peaches the Wonder Cat

Markass the Worst

don't do stance, kids

The Mass Shooter Ron Soye

You CAN'T NOT DO IT!

Markass the Worst

don't do stance, kids

Hrafn

"....Of that lie thy soul hath spoken!"

Volkamer

facepalm veteran

Flatline

Volkamer

facepalm veteran

Fag Alert Major Fag Alert