Boobie Bomb
kiwifarms.net
- Joined
- Apr 13, 2020
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
ig.tinf.io is successfully swapped in but it needs "u/" to be inserted in front of the username to work:Instagram: https://ig.tinf.io/sickasslizzy / (http://archive.md/Iqq62)
Backup Instagram: https://ig.tinf.io/sickasslizzy2 (for livestreams when banned)
You can easily differentiate the two because instagram has two url formats, one for user profiles (see above) and one for direct links to posts that uses "p/":Instagram: https://ig.tinf.io/u/sickasslizzy / (http://archive.md/Iqq62)
Backup Instagram: https://ig.tinf.io/u/sickasslizzy2 (for livestreams when banned)
I could not find anything looking through their github/sourcehut pages. I have no idea how difficult it would be to build archiving functionality into the source code, either.
One thing of note is that invidious allows you to download videos directly with a link in your quality of choice. Unfortunately, you cannot use this to download an entire channel, but for individual videos, it would mean more accessibility instead of only one or two people in a thread using youtube-dl in a terminal.
Do agree with @The Real SVP though, archiving images would be many times easier.
Do they do any automated archiving?
That, minus the banns, is sort of the plan. If I understand Null correctly.inb4 kiwi farms gets banned from all the archiving sites and has to start up its own
Jesus. Godspeed. This stuff isn't easy; there's a very good reason there's like two archive sites in the world that anybody uses.That, minus the banns, is sort of the plan. If I understand Null correctly.
Why the headless browser? What makes that specifically so resource intensive?Jesus. Godspeed. This stuff isn't easy; there's a very good reason there's like two archive sites in the world that anybody uses.
The only way to archive a site is by running it in an headless browser; anything else won't work with modern websites (Discourse, Twitter) that use JavaScript for everything. Headless browsers need lots of resources, so you can't just rent 1 VPS and be done with it.
Also, you'll need clean, fast proxies to run the archival service. Their IPs get banned quickly.
The archive.md guy spends $1600 a month on it, has to moderate CP and DMCA, and he doesn't even have people trying to fuck with him.
The Farms can look forward to seeing even more abuse, combined with even less filtering (I don't think you'll do the insane CAPTCHA stuff).
You'll notice archive.md has extreme amounts of captchas. This is because displaying archived sites, let alone archiving arbitrary sites upon request, is extremely resource-intensive, and he'd get DoSed into oblivion if he didn't.
The Internet Archive has an annual budget of $10 million and several petabytes' worth of storage.
I'm not saying it's impossible, especially if you limit it to users of the forum, cache things heavily, and accept saving lossy copies of everything. But it will be a much greater engineering problem than running the forum at scale.
Browsers in general are resource intensive, look how much CPU it takes to load a Twitter page. If you're loading many pages every second, you'll need a lot of servers.Why the headless browser? What makes that specifically so resource intensive?
I know that. I was just under the impression that a headless browser was somehow more intensive than a typical browser.Browsers in general are resource intensive, look how much CPU it takes to load a Twitter page. If you're loading many pages every second, you'll need a lot of servers.
The caching isn't a problem, it's the only way to run a site like this at scale. You would either have to have your headless browser dump out the DOM after running all the JavaScript (archive.is method) or try to do weird hacks to get the JavaScript to run semi-properly (archive.org method).The caching is a pretty big problem. I have an idea as to how it could possibly be mitigated, but it isn't an easy one.
This is what Twitter did since the redesign. It is the reason why the site is total shit shit.You'd take a commonly archived website, like twitter for instance, and look at global elements. Anything static that exists on every page, such as button/logo imagery, CSS, layout, and individual elements. No matter what account or post you're looking at, these global elements will always be present.
No, that's fine. What you're describing already exists. If you load Twitter in your browser, it takes a template and fills it in by calling the Twitter API. If you run Nitter with warcprox, you would save those API calls, and you could (theoretically) have Nitter interact with Twitter and fall back to your Twitter archive if it fails. Hacking Twitter's "real" web interace to call into your archive will be much more difficult, but that's not strictly necessary.Take the global elements and make a "template" out of it, and have this exist on users' computers.
Then, with this on your computer, when you make a request for a webpage, the archive asks your computer whether or not you have the necessary template, your computer tells it yes, and then it sends you the dynamic information that your browser constructs into a full webpage locally.
If a webpage is 70 KBs, and 50 KBs is global elements, such as the twitter logo and layout HTML, then with that on your computer the server would only need to supply 20 KBs of data for you to view the webpage. This is fundamentally what browser caching is, but a version compatible with internet archives.
Of course this would require having a program on your computer to store the templates, communicate with the archive in a unique way, and construct full pages to hand off to the browser for rendering.
I wasn't referring to Twitter. I was referring to the interaction between individual computers and the archive itself. Having globally present elements on a users computer would mean the archive wouldn't need as much bandwidth to achieve the same results. Only give connected users the information they don't have.This is what Twitter did since the redesign. It is the reason why the site is total shit shit.
I fucked up there. I was thinking about my little idea when I was writing it. You'd definitely need caching in the server.The caching isn't a problem
It sounds a lot more difficult than just doing a static archive, though.I wasn't referring to Twitter. I was referring to the interaction between individual computers and the archive itself. Having globally present elements on a users computer would mean the archive wouldn't need as much bandwidth to achieve the same results. Only give connected users the information they don't have.
Yeah, this is the more practical approach - only save one copy of the Twitter logo rather than one for each archived Twitter page. It looks like that warcprox tool linked above already handles that to an extent.Storage internally could also make use of this, by having thousands of twitter pages only consist of their unique elements linked to a common "library" of elements for all of them. Its really the same idea as linux packages where a single package can be used by multiple programs, and the individual programs themselves don't need to harbor that data. Don't know if this is how it works already, but if it doesn't, its an idea to use up less space.
There's no point. The bulk of the bandwidth is going to be spent on videos and pictures, not HTML. You can compress that anyway, and reap most of the space saving benefits.I wasn't referring to Twitter. I was referring to the interaction between individual computers and the archive itself. Having globally present elements on a users computer would mean the archive wouldn't need as much bandwidth to achieve the same results. Only give connected users the information they don't have.
The archive could still interact with twitter normally without needing to do some API magic, just discarding elements by comparing to what already exists in the database. and filling them with instructions for a "webpage compiler" that would exist on users' computers.
The JS libraries will do something like that, yes. JQuery will only be stored once, for instance.Storage internally could also make use of this, by having thousands of twitter pages only consist of their unique elements linked to a common "library" of elements for all of them. Its really the same idea as linux packages where a single package can be used by multiple programs, and the individual programs themselves don't need to harbor that data. Don't know if this is how it works already, but if it doesn't, its an idea to use up less space.
Why the headless browser? What makes that specifically so resource intensive?
At first I asked myself "why though?"Unfortunately, the owner of the archive sites is a retard and he has to be replaced.
Yeah, I just ran into this. An error page comes up but it has a link to the correct URL with the "u/" before the username.It doesn't look like the wordfilter is handling instagram user profile links very well. Looking at the bottom of my Beauty Parlor OP here, I have two instagram profile links:
ig.tinf.io is successfully swapped in but it needs "u/" to be inserted in front of the username to work:
You can easily differentiate the two because instagram has two url formats, one for user profiles (see above) and one for direct links to posts that uses "p/":