Archival Tools - How to archive anything.

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Been spending a lot of time archiving instagram stories for a few threads and found these three websites to be very helpful:
Story Saver (Pulls up current stories and highlights)
InGramer (Can also be used to pull videos off regular instagram posts)
Picuki

There used to be one that could pull off private instagrams, but it is not working anymore. There also used to be a trick where you could save complete website on instagram stories to pull the vid/image files, but at moment this is only working sometimes for image files.

For youtube and Tikok:
4K Downloader (I use the pay version so do not know the limitations of the free one)
 
Last edited:
Been spending a lot of time archiving instagram stories for a few threads and found these three websites to be very helpful:
Story Saver (Pulls up current stories)
InGramer (Can also be used to pull videos off regular instagram posts)
Picuki

There used to be one that could pull off private instagrams, but it is not working anymore. There also used to be a trick where you could save complete website on instagram stories to pull the vid/image files, but at moment this is only working sometimes for image files.

For youtube and Tikok:
4K Downloader (I use the pay version so do not know the limitations of the free one)
I have the free version of 4k downloader and my limit is 30 downloads a day. Useful enough.
 
  • Informative
Reactions: GenociderSyo
Been spending a lot of time archiving instagram stories for a few threads and found these three websites to be very helpful:
Story Saver (Pulls up current stories)
InGramer (Can also be used to pull videos off regular instagram posts)
Picuki

There used to be one that could pull off private instagrams, but it is not working anymore. There also used to be a trick where you could save complete website on instagram stories to pull the vid/image files, but at moment this is only working sometimes for image files.

For youtube and Tikok:
4K Downloader (I use the pay version so do not know the limitations of the free one)
Why not use 4K for Instagram as well? I got it for $10 off during their Christmas sale, saves a lot of trouble with the auto saving feature. I personally just use yt-dl for youtube, though.
 
  • Informative
Reactions: GenociderSyo
Here's a method for archiving Reddit accounts. I know jack shit about coding but I can do it so you can too. I stole this off of Voat, everyone's favorite alt-right Reddit alternative.

Reddit data is available on BigQuery
https://bigquery.cloud.google.com/table/fh-bigquery:reddit_comments.2015_11

Click on "Compose Query" and paste the following:
SELECT
id
,link_id
,parent_id
,subreddit
,author
,score
,STRFTIME_UTC_USEC(created_utc*1000000,"%Y/%m/%d %H:%M:%S") AS CreatedOnUTC
,"http://www.reddit.com/comments/" + SUBSTR(link_id,4) + "/_/" + id AS URL
FROM
[fh-bigquery:reddit_comments.2007]
,[fh-bigquery:reddit_comments.2008]
,[fh-bigquery:reddit_comments.2009]
,[fh-bigquery:reddit_comments.2010]
,[fh-bigquery:reddit_comments.2011]
,[fh-bigquery:reddit_comments.2012]
,[fh-bigquery:reddit_comments.2013]
,[fh-bigquery:reddit_comments.2014]
,[fh-bigquery:reddit_comments.2015_01]
,[fh-bigquery:reddit_comments.2015_02]
,[fh-bigquery:reddit_comments.2015_03]
,[fh-bigquery:reddit_comments.2015_04]
,[fh-bigquery:reddit_comments.2015_05]
,[fh-bigquery:reddit_comments.2015_06]
,[fh-bigquery:reddit_comments.2015_07]
,[fh-bigquery:reddit_comments.2015_08]
,[fh-bigquery:reddit_comments.2015_09]
,[fh-bigquery:reddit_comments.2015_10]
,[fh-bigquery:reddit_comments.2015_11]
,[fh-bigquery:reddit_comments.2015_12]
,[fh-bigquery:reddit_comments.2016_01]
,[fh-bigquery:reddit_comments.2016_02]
,[fh-bigquery:reddit_comments.2016_03]
,[fh-bigquery:reddit_comments.2016_04]
,[fh-bigquery:reddit_comments.2016_05]
,[fh-bigquery:reddit_comments.2016_06]
,[fh-bigquery:reddit_comments.2016_07]
,[fh-bigquery:reddit_comments.2016_08]
,[fh-bigquery:reddit_comments.2016_09]
,[fh-bigquery:reddit_comments.2016_10]
,[fh-bigquery:reddit_comments.2016_11]
,[fh-bigquery:reddit_comments.2016_12]
,[fh-bigquery:reddit_comments.2017_01]
,[fh-bigquery:reddit_comments.2017_02]
,[fh-bigquery:reddit_comments.2017_03]
,[fh-bigquery:reddit_comments.2017_04]
,[fh-bigquery:reddit_comments.2017_05]
,[fh-bigquery:reddit_comments.2017_06]
,[fh-bigquery:reddit_comments.2017_07]
,[fh-bigquery:reddit_comments.2017_08]
,[fh-bigquery:reddit_comments.2017_09]
,[fh-bigquery:reddit_comments.2017_10]
,[fh-bigquery:reddit_comments.2017_11]
,[fh-bigquery:reddit_comments.2017_12]
,[fh-bigquery:reddit_comments.2018_01]
,[fh-bigquery:reddit_comments.2018_02]
,[fh-bigquery:reddit_comments.2018_03]
,[fh-bigquery:reddit_comments.2018_04]
,[fh-bigquery:reddit_comments.2018_05]
,[fh-bigquery:reddit_comments.2018_06]
,[fh-bigquery:reddit_comments.2018_07]
WHERE author = 'username' ORDER BY CreatedOnUTC
Important:
You will neeed to add more of those ",[fh-bigquery:reddit_comments.20xx_xx]" lines, depending on the date you do this. Check how far the archive goes, and add lines accordingly.
On the last line, change the 'username' to the username of the account you want to archive. The username is case-sensitive and do not delete the apostrophes.

When you're done, run the query and wait until the archival is complete. When it ends it'll present you different methods to download the account's history. Easiest method imo is to download the data as an Excel file.


Pros:
  • You don't have to archive a shit ton of pages on archive.md, this method archives thousands of comments at once.
  • Reddit hides posts that are older than 1 year or so on profile pages. This method bypasses that.
  • You can customize the query as you wish, given you know how to use this crap (I don't).
Cons:
  • Last 2-3 months of posts are missing. You still need to archive the last couple of pages of an account through archive.md. USE THE OLD. DOMAIN (old.reddit.com) or the account is archived with the redesign and it looks horrible, sometimes even completely broken.
  • You need to login to your Google account to use BigQuery (as it's a Google service), so you can not access this data anonymously. I don't believe other users can see your activity, but Google certainly can.

Does anyone have an update to this? I have tried but I get a syntax error as snipped below. I have 0 skills so all i have tried is just removing the , and the line itself but it says similar. error2.PNG

error.PNG
 
Is there a good service to dump a bunch of Patreon videos to? They're currently hosted on yandex.disk which works but it is pretty slow. I understand that Null is open to hosting large files but I don't know if it's worth bothering him if there's a good alternative.
 
Does anyone have an update to this? I have tried but I get a syntax error as snipped below. I have 0 skills so all i have tried is just removing the , and the line itself but it

In a "SELECT ... FROM ... WHERE ..." the part after "FROM" is the database tables to look in. Try replacing the [ ] with backticks ` ` like it says (leave the commas)
 
  • Like
Reactions: Fìddlesticks
In a "SELECT ... FROM ... WHERE ..." the part after "FROM" is the database tables to look in. Try replacing the [ ] with backticks ` ` like it says (leave the commas)
Thank you. That seems obvious now.

I have now changed this and now I have an error that says this, apologies for the hand holding, I understand nwo I need to change the : but what as, i tried a . but that didnt work so I am back to asking.

error 3.PNG
@SqXuSR - trying to use this method but as you can see, boomer issues ahoy
 
Try replacing the colon, but escape the existing period as \. since it's probably interfering

If not then I'm not sure, I was trying to go see the syntax they're using now but the bigquery link redirects to a new version of the site and I can't find anything there
 
Last edited:
  • Like
Reactions: Fìddlesticks
btw, if you guys have time, you should try to save some pages at https://archive.org/web/
most are already saved at archive.md but it won't hurt to have backups

btw: selecting "save outgoing links" saves every linked page, including the previous and next two pages

note: "save outgoing links" is only for IA members.

I found out you can just do this from a not-so-publicized API by curling https://web.archive.org/save/$url where $url is the url of the page you want to save.

Example, retrieving only headers. A successful save should say HTTP response code 302 at the top and give you the location of the saved page at the line that says location:
Bash:
curl -I "https://web.archive.org/save/https://kiwifarms.net/threads/the-great-twitter-meltdown-of-2021.93623/page-102"
 
Last edited:
The easiest way to take small (read: non-fullscreen) screenshots in Windows is to click Start > type "snippingtool" into the run box > draw a box around what you want to snip and it'll take a screenshot of it.

Then just click the copy button and paste the image into your message as an attachment. No mucking about with Imgur or uploading it to an external site.
I personally use Lightshot for screenshots on desktop. I used it since like 2014-2015. You just press PrScr and then you select the area, and you can even draw cocks ontop, write "nigger" if just typing it isn't enough and draw arrows as if it was an r/arabfunny screenshot. Sometimes when I'm to lazy to download a Google image, I screenshot it with Lightshot.
 
  • Lunacy
Reactions: args
The main guy doing releases is this Russian guy dstftw who appears to have disappeared. I wonder what happened to him.
 
  • Thunk-Provoking
Reactions: BSC
What's the best way to get a mirror of this blogspot on my laptop that I can browse locally? Is wget still the way to go?

Code:
wget -mkEpnp https://pleasantfamilyshopping.blogspot.com/

Code:
--mirror – Makes (among other things) the download recursive.
--convert-links – convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.
--adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type.
--page-requisites – Download things like CSS style-sheets and images required to properly display the page offline.
--no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.

yoinked


Testing it, 250MB so far

Edit:

Code:
┌─[0]─[ec2-user@althalus]─[~/web/pleasantfamilyshopping.blogspot.com]
└── $ ls 2009/01
before-they-drove-old-dixie-down.html
before-they-drove-old-dixie-down.html?showComment=1231568580000.html
before-they-drove-old-dixie-down.html?showComment=1231597740000.html
before-they-drove-old-dixie-down.html?showComment=1231628700000.html
before-they-drove-old-dixie-down.html?showComment=1231647240000.html
before-they-drove-old-dixie-down.html?showComment=1231648800000.html
before-they-drove-old-dixie-down.html?showComment=1231705080000.html
before-they-drove-old-dixie-down.html?showComment=1231762860000.html
before-they-drove-old-dixie-down.html?showComment=1231797780000.html
before-they-drove-old-dixie-down.html?showComment=1231816200000.html
before-they-drove-old-dixie-down.html?showComment=1231860600000.html
before-they-drove-old-dixie-down.html?showComment=1231874640000.html
before-they-drove-old-dixie-down.html?showComment=1231896720000.html
before-they-drove-old-dixie-down.html?showComment=1231956060000.html
before-they-drove-old-dixie-down.html?showComment=1231957860000.html
before-they-drove-old-dixie-down.html?showComment=1231965420000.html
before-they-drove-old-dixie-down.html?showComment=1232220900000.html
before-they-drove-old-dixie-down.html?showComment=1232250240000.html
before-they-drove-old-dixie-down.html?showComment=1232473200000.html
before-they-drove-old-dixie-down.html?showComment=1232518380000.html
before-they-drove-old-dixie-down.html?showComment=1236868080000.html
before-they-drove-old-dixie-down.html?showComment=1251435228935.html
before-they-drove-old-dixie-down.html?showComment=1251931958910.html
before-they-drove-old-dixie-down.html?showComment=1257957470234.html
before-they-drove-old-dixie-down.html?showComment=1258261854675.html
before-they-drove-old-dixie-down.html?showComment=1259803348458.html
before-they-drove-old-dixie-down.html?showComment=1267881282783.html
before-they-drove-old-dixie-down.html?showComment=1269228912362.html
before-they-drove-old-dixie-down.html?showComment=1285339903068.html
before-they-drove-old-dixie-down.html?showComment=1289840940876.html
before-they-drove-old-dixie-down.html?showComment=1311353847289.html
before-they-drove-old-dixie-down.html?showComment=1311690238264.html
before-they-drove-old-dixie-down.html?showComment=1314047838456.html
before-they-drove-old-dixie-down.html?showComment=1320947353308.html
before-they-drove-old-dixie-down.html?showComment=1320948768575.html
before-they-drove-old-dixie-down.html?showComment=1327209217291.html
before-they-drove-old-dixie-down.html?showComment=1327335790145.html
before-they-drove-old-dixie-down.html?showComment=1329949431320.html
before-they-drove-old-dixie-down.html?showComment=1348232834403.html
before-they-drove-old-dixie-down.html?showComment=1355109991391.html
before-they-drove-old-dixie-down.html?showComment=1358267814600.html
before-they-drove-old-dixie-down.html?showComment=1416711246397.html
before-they-drove-old-dixie-down.html?showComment=1425612209033.html
before-they-drove-old-dixie-down.html?showComment=1425612414173.html
family-affair-at-kroger.html
happy-new-year.html
index.html
very-fashionable-kroger-1966_25.html

uh ok just a minute lol

Code:
wget -mkEHpnp -R "*?showComment*" -D "pleasantfamilyshopping.blogspot.com,1.bp.blogspot.com,2.bp.blogspot.com,3.bp.blogspot.com,4.bp.blogspot.com" https://pleasantfamilyshopping.blogspot.com/

Adds:
-H traverse hosts
-R reject
-D domains to follow

You can monitor for yourself while it runs:

Code:
watch -n1 "du -hs /path/to/directory"

It seems to be doing what it's supposed to except it's taking a while because, as the linked blogger notes, it downloads the unwanted pages and then throws them away

Yeah that last one is good, final count 417MB with everything, 86MB for just the stuff on the pleasantfamilyshopping domain

Paste:

Code:
printf '\n\nblogget() { \nwget -mkEHpnp -R "*?showComment*" -D "$1,1.bp.blogspot.com,2.bp.blogspot.com,3.bp.blogspot.com,4.bp.blogspot.com" $1 \n}\n' >> ~/.bashrc ; source ~/.bashrc

Result:

Code:
blogget() {
wget -mkEHpnp -R "*?showComment*" -D "$1,1.bp.blogspot.com,2.bp.blogspot.com,3.bp.blogspot.com,4.bp.blogspot.com" $1
}

Use:

Code:
blogget https://pleasantfamilyshopping.blogspot.com/
 
Last edited:
Back