@grand larsony Good job on quantifying this - I doubt anyone on KF is surprised by the findings but there are a great number of normies, some of whom may even be parents, who would be rather shocked, or worse, puzzled (Example- one of my neighbours thought the "internet" was something her child's school had
made to give children homework with during lockdown. I don't even know how such people can be warned)
Some questions:
1. How stringent were you when excluding sex/porn oriented servers? Did you also cut shit like e-thot simping and fan-fiction, for example?
2. Do you have a false positive rate for your program, and if so, what was it? (False negative isn't feasible with this methodology, obviously)
Again, good show and valuable dataset.
Yeah, my parents were unprepared for the dangers of the ~2006 internet. I can't even imagine how someone similarly uninformed would be able to cope with the modern internet. It's so much more sly now, everything is a cute little app full of smiley faces and shit but the pedos are still there, seemingly in bigger numbers than ever. At least back in 2008 there was still an air of suspicion that a lot of adults had around the internet, but that seems totally gone now.
1. I did it all through server/channel names. I skimmed through the top few hundred servers and didn't see anything with names/channel names that appeared explicitly pornographic. There are some servers that I'm sure have a higher concentration of sexual content - the bot is in one server for meth enthusiasts, for instance - but nothing where the primary focus is sharing porn that I could find anyway.
2. There are false positives and false negatives, but in weird ways that aren't immediately intuitive. For false positives, there are things which
technically talk about kids in a sexual way, but that aren't inappropriate in the "I want to fuck a child" kind of way. False positives would be things like "I got molested when I was a kid and I've been fucked up ever since" - it's technically talking about children sexually, but not in the predatory way. In a sense, whether or not stuff like that is even a false positive is up to your interpretation of how the scoring should be done.
There are a number of false negatives, but it's very very hard to get false negatives consistently unless you're extremely diligent about obfuscating every message you send. For example, if you send "I" and "want" and "to" and "fuck" and "a child" as separate messages, that gets past the filter. If you send " 'I want to fuck a child' is a very bad thing to say, and you should never say it" the filter will contextualize that as you scolding someone, even if the real context is different. Normal character swapping doesn't work, but if you go really hard at it, that'll bypass the filter - something like "(eye) ||vv@N+ +0 fn(k @ (H1L|)". But typing in these ways isn't really sustainable.
As for the rate, I feel confident that all the big red dots are serious offenders. On an individual message level there are occasional false positives and false negatives, but the scoring in the network graph is based on the average score over all of a user's messages from the DB. Now, I could be wrong - maybe there's a couple guys in the dataset who just really need to vent about how they got molested as a kid or something - but a cursory check of the database shows me that isn't the case for any significant portion of users.
Also I guess I'll elaborate on the scoring mechanism as well. The bot scores everything with a float from 0 to 1. Even innocuous sentences will get some kind of non-zero score, like "what's up boys" might get a 0.0001 because "boys" also appears in the training data for stuff that talks about kids sexually. Varying severity of the text will give different scores around the middle. Here's an example of what text around the 0.5 mark (the point where OpenAI considers a text "flagged") looks like:
These are messages which make me raise my eyebrows a bit, but you can see there are some definite false texts at this range, like the guy who appears to be referring to drugs when he says "4 meo". Personally I consider anything above ~0.8 to be pretty unambiguous but even then there are exceptions like I've discussed above.
Here's a box plot which shows the distribution of average scores per user. You can see that the upper fence is about 6x higher than the median, and there are still quite a lot of users above the upper fence. I feel confident that anyone above this range is someone whose browser history should be looked into, at the very least.
A couple people also asked about server breakdowns. I used a text embedding tool, t-SNE, and KMeans to cluster the larger servers into 10 groups. Here's an image:
The positions on the x and y axes aren't important here. What's important is just that similar servers are clustered together based on their names. Here's a rough breakdown of the category boundaries:
Yellow - Chinese servers, no idea what these names mean
Burgundy - Russian servers, mostly appear to be personal servers but my Russian isn't great
Red - various other Asian languages, can't identify all of them but I see some Japanese and Korean text here
Gray - Spanish servers which mostly appear to be meme-related and personal servers
Purple - mixture of English and Japanese servers related to AI tech
Orange - all meme servers, all English
Green - seems to be social club type servers without a strong common theme in the naming. Lots of mentions of "club" "cafe" "hangout" etc
Blue - personal servers owned by people with English names
Black - mixture of political and tech-related servers, with a couple personal servers
Pink - technology, video games, and (uh oh) servers that seem to be related to school/studying
Good work OP.
I am curious about the business model. You say you make money, who pays you? Discord or the owners of the groups?
If it's the latter, how do they monetize it themselves? Are companies on Discord?
Idk anything about Discord, but I imagine that your bot can probably prove useful in other areas. LLM can be useful beyond moderation efforts, which although necessary, are not profit drivers. Is the proprietary part the discord integration only? Or could it be tweaked for other purposes where there is a need and offer is not already saturated?
I won't go into too much detail since I'm doing a lot of stuff that's not exactly unique, but all the features together form a bot that's pretty unique. The bot has a mixture of user-focused services like chatbot stuff, administration-focused services like the AI moderation stuff, and general utilities that aren't really "fun" features or administrative but are useful nonetheless. The money comes
entirely from the user-focused stuff. Premium subscriptions, one-off purchases, etc.
My plan is to slowly transition it to being more moderation focused which is why I've spent so much time developing analytics for moderators. I recognize that "Discord bot plugged into the OpenAI API" isn't a unique business proposition and a ton of other bot devs are doing similar projects, so my goal is that I can spread the bot with the user-focused features and eventually transition it to being a moderation bot similar to (but better than) big moderation bots like Dyno and CarlBot. Moderation features are a bit harder to replicate than chatbot shit, and a lot harder to grow as a general rule, since people don't typically want to add a brand-new bot that requires a ton of dangerous permissions.