Working on a decentralized forum - Python, Tor, no blockchain

I've been giving some thought to a similar issue, i.e. how to make a troon-resistant KF. I'm wondering if it's not insane to do away with the database entirely, and store all posts in Git, and all rendering etc. done by JS in the browser. This would mean:

* It's trivial for anyone to take a local copy of the forums ("git pull"), and trivial to restore a backup
* Reading threads just involves reading static files, which is as fast as things can possibly be. There's no reason why the server should waste resources in e.g. choosing the random_text.txt message, when this can be off-loaded to the user. There'd still be a script to handle posting and anything else that changes the repo.
The premise here is a bit off - serving the static pages is simple because you can pre-render the whole static page, keep it in RAM, and blindly serve it without even looking at the request.

Filesystem-as-DB doesn't really scale that well if you have a lot of small objects; they're not really designed for that sort of workload. Even if it were able to handle it well, the filesystem is still a b-tree, so I don't see why you think the structure is better. You will certainly gain a huge performance benefit from not using a certain forum software that doesn't believe in caches, though.

* Because everything has a Git hash, you can use this as an HTTP ETag, make everything very cachable.
If-Modified-Since / Last-Modified headers could be used with the current system already.
 
Filesystem-as-DB doesn't really scale that well if you have a lot of small objects; they're not really designed for that sort of workload. Even if it were able to handle it well, the filesystem is still a b-tree, so I don't see why you think the structure is better.
I was thinking that it should be faster as you're doing the lookups etc. in kernelspace rather than userspace. Plus I'd expect the kernel to be better at doing caching than the DB.

Another aspect is that Git stores its objects gzipped, so the web server wouldn't have to recompress them before transmitting.

I've no idea if any of this would work in practice or not, it's just an idea I had.
 
I think this gets to the heart of where our views differ. The way I see it, people use this site because they want a community where everyone is operating on the same "rules of the road" that they like. It's more than just having your personal trust in the head moderator set to 100%.
All right, this is a fair point. I only really hint at it in the OP, but I don't see this as a competitor to "managed" forums, where there is an obvious intention to cultivate a certain type of atmosphere.

I see this more as an entrant in the niche of "generic" communications platforms - Twitter, Fediverse, 4chan, reddit, 9chan - where there is no real "goal" other than "post what you want and read what you want".

That being said, you may have a point, and 8chan/reddit-style "owned boards" would at least be technically feasible (see spoiler above). However, at that point most of the benefits of decentralization are lost. At the very least, there would be social aspects of it that you would have to figure out.

I think having an archive is exactly the point, especially at a place like this where archive-diving is the national sport.
I don't think the retention would be as poor as you suggest, but I don't think it's at all possible to theoretically analyze it without testing it in practice, so let's agree to disagree on that one.
If we want a non-archived discussion place there are plenty of bad solutions already available, as you mentioned.
At any rate, it's easy to create something that looks like a forum if you're willing to hand-wave away the tricky parts. But I think those tricky parts are actually key aspects of the experience.

Wait, what would those be? To my knowledge, all existing forums fail on at least one of these three:
  1. Censorship-resistant (decentralized)
  2. Spam-free (really, spam is a form of censorship)
  3. Usable (don't need special ISP, gigabytes of storage, etc)
In fact, here's a question: what do you consider the essential features of a forum to be?
Very good question.

A forum, to me, is just a place where you can talk to others. That being said, there are of course specific forums, which have a bigger goal than that - such as this one, where there are conscious decisions to ban unpleasant users and so on. But for a generic forum, I don't think there are any high standards for content moderation.

Take 8chan, for example. The only global rules on 8chan were:
  1. No content that violates US law.
  2. No sexualization of minors.
I can't see how that fosters a community spirit either. And yet 8chan felt like a community.
I agree with @Ridley: while interesting, this is over-engineered. No-one's going to download and run a Python program just to shitpost about trannies. Plus it effectively excludes phoneposters (this may not be a bad thing).
To be honest, you're not wrong here in general (though see above, for notes about web gateways), but I'm not sure it would continue to be true in a total censorship apocalypse scenario. Like, at that point they would either have to download the software or stop posting.

As for phones, I think that's like the one UX ray of hope. Phoneposters already download apps, and it wouldn't be hard at all to port the client into an APK and upload it to F-droid. Data usage should be very modest after the first sync, and definitely on par with normal web browsing.




You're also fixing problems the Farms doesn't really suffer from, namely spam and poor moderation.
That's because KF is centralized. If you can centralize, these problems are not difficult, and I agree that KF does not have these problems currently (nor is it likely to, in the future)
I've been giving some thought to a similar issue, i.e. how to make a troon-resistant KF. I'm wondering if it's not insane to do away with the database entirely, and store all posts in Git, and all rendering etc. done by JS in the browser. This would mean:

* It's trivial for anyone to take a local copy of the forums ("git pull"), and trivial to restore a backup
* Reading threads just involves reading static files, which is as fast as things can possibly be. There's no reason why the server should waste resources in e.g. choosing the random_text.txt message, when this can be off-loaded to the user. There'd still be a script to handle posting and anything else that changes the repo.
* Because everything has a Git hash, you can use this as an HTTP ETag, make everything very cachable.
* You can use service worker shenanigans to download a whole thread to read offline, and so on
* If there's ever a DDOS attack, Josh can shift the whole thing into read-only mode, rather than the whole thing being forced online. Look at how the static "sorry" pages are usually fine, even when the site is being hammered. PHP and DBs are both resource-intensive.
* If the whole thing collapses, someone can just take a recent snapshot of the repo and declare that to be the new Kiwi Farms

Obvious problems include how you handle searching, and what you do about stuff like passwords.

I've never seen anybody use Git as a database, so I don't know if this is retarded or not.
I really don't think that's viable. What you have there is basically a static snapshot that's constantly updated. So you will still need infrastructure to receive posts, and won't solve any of the fundamental issues:

  • You will need multiple terabytes of disk to download the full archive, because that's how big it is
  • If you stream the archive on-demand, you've basically reinvented the concept of client-server requests
  • Exposing the database as a torrent if the site goes down permanently is already planned
  • Having a read-only copy of the site to browse during attacks is not a significant improvement over the status quo with the onion.
Basically, as long as the entire thing has to be serialized into a single database based on non-public information, there is going to be a single point of failure no matter what you do.

Even under the status quo, KF could rent a new server somewhere, do streaming replication onto that one, and then allow anyone who wants to set up their local MySQL server to do streaming replication from it. Alternatively, you could share WAL logs over BitTorrent, it wouldn't be hard.

But what's the point? A read-only mirror just does not buy you that much. I want to post, god damn it, and if I can't post at least I want to be able to read others' fresh posts. If you wanted to do it, a better starting point might be this:

  1. Take all ~13.4 million posts
  2. Assume average of 200b each
  3. Gives ~3GB
  4. Write all posts into columnar database
  5. Compress it really hard with something like squashfs + xzip
  6. Copy over the backend infrastructure and compile it to WebAssembly
  7. Stream all of this shit over WebTorrent

This would work, sort of, but it's not really clear what you gain from it, and it would be a lot of effort to implement properly.

If you could have custom error pages, it might make more sense to take the ~1000 most visited pages, compress them hard, and then expose those as JS so the site at least looks kind of online. But I strongly doubt that's viable, there are size limits and so on.

There are some similar ideas which have been tried by the (sadly dead) project ZeroNet, I encourage you to check those out. But they use one database per user, and then the client joins them together, which has significant advantages.
 
Decentralization needs to happen on the infrastructure level. There are many alternative protocols and applications such as Tor, Yggdrasil, and I2P but they rely on the same infrastructure other networks do. The infrastructure that makes up the backbone of the Internet is complex and fragile. Ultimately the problem is not on the application layer but on the physical one.
Physical infrastructure is gay. Tor already works extremely well for all practical purposes, and it would take millions if not billions of dollars in investments to actually get anywhere.

There's limited resources, and spending them on re-inventing the wheel as a vanity project is not a great idea. Instead, I think people should be focussed on building high-value inventions, that combine existing technologies in a novel way to achieve new goals (like forums), while taking on minimal infrastructure maintenance burden.

Remember how all those companies went bankrupt laying fiber-optic cable in the 1990s? Do you really want to give that a shot?

I was thinking that it should be faster as you're doing the lookups etc. in kernelspace rather than userspace. Plus I'd expect the kernel to be better at doing caching than the DB.

Another aspect is that Git stores its objects gzipped, so the web server wouldn't have to recompress them before transmitting.

I've no idea if any of this would work in practice or not, it's just an idea I had.
No gain. Servers already cache gzip, databases already use fs-level caching, and so on.

Performance-wise, a simple forum is already very fast - just fetching posts and threads is going to be a series of very simple SELECT queries on indexed columns (maybe even PRIMARY KEY), and the expensive parts aren't in the DB queries themselves or templating, but rather in all the parts around them (see: XenForo's dumb approach to caching etc)
 
I have actually thought that KiwiFarms should have done this roo (using decentralized instances) after the DDoS attacks and stuff.
But, I doubt that they ever would, since this would probably take too much effort to do.
 
There is too much expected functionality of a forum to have everything 100% decentralized. If a forum were merely lists of posts then you could have that fully distributed and even optimise in only pulling threads that were looked at and other refinements. But there is too much utility in search functionality, ratings, etc. What maybe should be taken as an approach is to have the core content distributed - i.e. posts and their metadata, user lists - but have the supplementary functionality like search re-buildable from that. I.e. you still have "servers" but anybody can build them from the distributed content. E.g. "servers" sync to the distributed content (both directions) but build their own copies of search indexes, etc. Clients still exist and connect to one of those servers as if it were a traditional single server. This is more achievable than trying to make a distributed network of uniform nodes that are each their own full or even partial instance. Stick with Client-Server, it works. But have the Servers be the distribution and written so that anybody can pick up and become a server as needed.

Regarding having all the content be a GIT repository, I see the logic conceptually but I don't think it would handle well the volume of many, many small changes and the concurrency. The deltas would get huge over time. You'd need a periodic shift to a new repo clearing out the history and put the old one in archive for posterity / safety. But maybe I'm wrong. It's my IT hunch, though.

I like the idea of de-coupling the front and back end. Others have mentioned about how rapidly you can have a static / semi-static webpage ready to go in RAM and how small that is. True... but even faster if you're just a REST API that replies with some compressed JSON on request from a front-end webapp written in React or (imo better) VueJS. I'm sure some will react negatively to that proposal and call it bloat. Maybe. This isn't a hard push from me but this helps the server side and modern browsers can handle a web-forum UI on the client side quite well. It also helps with distribution because you're going to need such an API for that anyway, so that different nodes that aren't trying to display something to a user (at least immediately) but just need up to date info can request, e.g. "/posts/forumId/threadId/>1668944483" or whatever and get back a nice JSON list. Works too for limiting data load on client requests who hopefully have a local cache of posts and just need updates for reactions and new posts. Don't send a whole copy of the page for updates, just send a json array containing two new post objects since the latest time stamp they send in their request.

You could build something very light and very fast in PHP (yes, really. It's become a very good language since its early 5.x days) that was a solid API for this.

For user accounts you may want a whole parallel layer for identity management. The software could efficiently be written to use this parallel system for user identity and associating them with the posts. They could be Matrix identities or who knows. But under the control of the user rather than something that were managed by the forum as per the current KF implementation. That helps make the system much harder to hack as well.

Finally, I would consider using RSS as an existing technology as a base for this. It is already out there, it can serve as a means of providing notification of thread updates and it also provides a nice gateway into using the forum from existing services and technology. Remember, this has to be not only good but also easy to get people on board.

We also need a legal viewpoint on all this to check it can't be taken away. E.g. through terms in licence software it's built with etc.

Man, I want to go away and write some of this right now. I could certainly write a prototype API or front end for this. Spec it up in OpenAPIv3 as well.
 
We also need a legal viewpoint on all this to check it can't be taken away. E.g. through terms in licence software it's built with etc.
It's not so much a legal question as a social question. Even if software is currently released under the MIT License, there's nothing stopping a project owner from switching all future releases to the "MIT Minus Kiwi" license (perhaps under duress). You're then faced with the choice of staying on the old version which is yours forever, making your own fork which you maintain in parallel without using any unlicensed commits from the main project, or going pirate.
Staying on an old version might be fine, or it might mean missing security patches or getting stuck on an old interface while the rest of the ecosystem moves on.

This means resilient design has to favor mature technologies that don't need security patches every 5 minutes, run by organizations too big, bureaucratic, and/or stubbornly libertarian to cave to pressure campaigns from a handful of Twitter weirdos.
 
So how do we make sure everything is updated and safe? do we need the blockchain and kiwicoin for that?
No. All blockchain/cryptocurrency is a scam until proven otherwise.

>updated
You download the updated content yourself.
>safe
Online safety is for fags. There is no execution of untrusted code, so there are no security problems from that perspective.
There is too much expected functionality of a forum to have everything 100% decentralized. If a forum were merely lists of posts then you could have that fully distributed and even optimise in only pulling threads that were looked at and other refinements. But there is too much utility in search functionality, ratings, etc.
Search can be done locally, SQLite supports it (see FTS5 module). Ratings can also be done locally, and are in fact a required component of the system.
What maybe should be taken as an approach is to have the core content distributed - i.e. posts and their metadata, user lists - but have the supplementary functionality like search re-buildable from that. I.e. you still have "servers" but anybody can build them from the distributed content.
This is definitely a viable approach, but building reverse word indices for search isn't terribly expensive anyway. I guess it would be reasonable if people are running Bitcoin-style "pruned nodes" where you only keep the history of the last N days on disk?
E.g. "servers" sync to the distributed content (both directions) but build their own copies of search indexes, etc. Clients still exist and connect to one of those servers as if it were a traditional single server. This is more achievable than trying to make a distributed network of uniform nodes that are each their own full or even partial instance. Stick with Client-Server, it works. But have the Servers be the distribution and written so that anybody can pick up and become a server as needed.
You're still going to have to solve the semi-difficult problem of properly implementing WoT (or whatever else you need to sync between the servers), so the problem of "trying to make a distributed network of uniform nodes that are each their own full or even partial instance" still remains, unfortunately.

However, once that's done the rest isn't terribly difficult. It would be possible to do search as you propose in a trustless manner though (although I admit I don't fully see the point): if I want to query for, let's say, "chris chan prison", then I can just query, let's say, 5 servers, and then look at the posts they get back and double-check that they contain the words I was looking for.

As long as at least one server is telling the truth, there's no possibility of censorship - think bitcoin SPV security model.
Regarding having all the content be a GIT repository, I see the logic conceptually but I don't think it would handle well the volume of many, many small changes and the concurrency. The deltas would get huge over time. You'd need a periodic shift to a new repo clearing out the history and put the old one in archive for posterity / safety. But maybe I'm wrong. It's my IT hunch, though.
Agreed. The main issue is synchronizing the changes, though.
I like the idea of de-coupling the front and back end. Others have mentioned about how rapidly you can have a static / semi-static webpage ready to go in RAM and how small that is. True... but even faster if you're just a REST API that replies with some compressed JSON on request from a front-end webapp written in React or (imo better) VueJS. I'm sure some will react negatively to that proposal and call it bloat. Maybe. This isn't a hard push from me but this helps the server side and modern browsers can handle a web-forum UI on the client side quite well.
I don't think that using JavaScript to talk to an API gains you a lot. You would have to implement this as browser extension for it to be trustless, at which point you might as well just install local software. And if running locally is not a design constraint, it would be much easier to just handle it like the Fediverse - each server is trusted to some degree and renders HTML pages locally.

But if for some reason running stuff as a browser extension (which doesn't even work on mobile or anything) is a desirable goal, it seems easier to just compile the Python interpreter for WebAssembly and then run the normal code.
For user accounts you may want a whole parallel layer for identity management. The software could efficiently be written to use this parallel system for user identity and associating them with the posts. They could be Matrix identities or who knows. But under the control of the user rather than something that were managed by the forum as per the current KF implementation. That helps make the system much harder to hack as well.
Why not just use WoT?

It would be trivial to do Keybase-style verification of KF/Matrix/etc identities if needed, though I don't see why it would be. See also a few posts up some thoughts I had on using other networks (e.g. KF) to bootstrap the web of trust.
Finally, I would consider using RSS as an existing technology as a base for this. It is already out there, it can serve as a means of providing notification of thread updates and it also provides a nice gateway into using the forum from existing services and technology. Remember, this has to be not only good but also easy to get people on board.
I don't really see the point of RSS. You wouldn't be able to reply to posts. I mean, I guess you could add it, but it would have to be a bolt-on.

I thought of using NNTP, but it's really geared towards threaded (HN/Usenet-style) discussion, and I'd rather have topic-based (KF/phpBB/4chan). Maybe it's still possible though?
We also need a legal viewpoint on all this to check it can't be taken away. E.g. through terms in licence software it's built with etc.

Man, I want to go away and write some of this right now. I could certainly write a prototype API or front end for this. Spec it up in OpenAPIv3 as well.
Even if software is currently released under the MIT License, there's nothing stopping a project owner from switching all future releases to the "MIT Minus Kiwi" license (perhaps under duress).
There is: stop using a cuck license.

If the project is licensed under the GPL, and you haven't signed a CLA, then the "project owner" literally can't do shit without obtaining the permission of all contributors. This is only possible with cuck licenses like the MIT license and so on. (For more info, see Drew DeVault's excellent posts on the subject.)
 
Kudos for the imagination but it seems like you've spent a lot of time building something that most people won't use because it's autistic as hell and has limited features. Personally I always wanted to see federated forum software based on ActivityPub and the Lemmy devs are working on exactly that. They've basically made a version called LemmyBB which utilizes phpBB as a frontend for their weird federated Reddit.

Federation has its own issues but at least most implementations have web based frontends so it's very accessible to normal people and about as censorship resistant as a forum like this needs to be.
 
Kudos for the imagination but it seems like you've spent a lot of time building something that most people won't use because it's autistic as hell and has limited features. Personally I always wanted to see federated forum software based on ActivityPub and the Lemmy devs are working on exactly that. They've basically made a version called LemmyBB which utilizes phpBB as a frontend for their weird federated Reddit.

Federation has its own issues but at least most implementations have web based frontends so it's very accessible to normal people and about as censorship resistant as a forum like this needs to be.
Thanks. I don't agree that the decentralized design is that limited, and the possibility for web frontends is bigger than you think, but anyway.

There are a lot of promising developments within fediverse, yeah. There's also a 4chan clone, but it has some design flaws so I don't think it will make it far. I am generally hopeful on the fediverse, but it is not truly censorship-resistant, as shown by the failure of truly censored figures to use it.
 
Back