Discord, big data, and pedophiles - The application of data science to Discord degeneracy

grand larsony

data science autist, kingcobrawiki.org
kiwifarms.net
Joined
Sep 5, 2023
Sorry if this isn't in the right place in the forum. I wasn't exactly sure where to post this, but a couple people expressed interest in learning more about this after I made a post about it in the Roblox community thread. @Shooter of Cum suggested I post a thread about it here.

I run a Discord bot that does a lot of moderation-related stuff. I'm not going to name it here both because this isn't some self promotional bullshit, and also because I fear the Discord turbojannies would ban my bot if they found out I was posting on the transphobic neo-Nazi alt-right cyber stalker global headquarters. I do make some money off of it so it's kinda in my best interest not to fuck it up, but still I feel this should be shared.

The elevator pitch for my bot and what it does is this, basically: I apply AI scoring mechanisms to text and images to catch things that exact match filtering wouldn't. For example, in the sentence "I want to f*ck a ch*ld" there's not a single word you could reasonably blacklist. Even if you tried to blacklist something like "fuck", writing "f*ck" gets around that. The AI can understand the context of the sentence as a whole and isn't fooled by little tricks like replacing a letter with a different character, or slightly misspelling a word on purpose.

I collect a shitload of data about users for this, but I don't keep it for terribly long. The Discord TOS says I can only keep it for up to a month since my bot is in >100 servers (right now, it's at a few thousand servers) and in the last month I've collected about 2.2 million messages as of the time of this writing. I don't follow the Discord TOS because I give a shit about their rules; mostly I just do it because keeping tons of old data wastes storage space and slows down database queries.

It's worth noting here that my bot isn't in any servers whose sole purpose is sexual content, like the kind of servers where people trade porn. These are all relatively normal communities, at least for Discord, with the focus being that most of them are tech-related.

I'm going to try to keep assumptions to a minimum and mark things that I'm assuming as such. I prefer to keep this focused on the data and what we can learn from this data in aggregate.

0. What the hell is a network graph, anyway?
Most of the data I'll be showing is in the form of network graphs. Here's what the network graph looks like, organized nicely but not yet colored in.

1701889751241.png

Each node is one person, and a line between two nodes indicates interactions between users. Darker lines indicate stronger connections between users. An interaction, for the purposes of this data, can be one of two things. Either A) a user replies directly to another user's message or B) two users have consecutive messages in the same channel.
Please note that, for these visualizations, I've removed some of the very irrelevant nodes from the graph. What I mean by that is, people who, for example, will send one or two messages, decide they're not interested in Discord/the server I can see them in, and then stop interacting. This makes the graph a lot easier to interpret.

1701889904185.png

Here's the same network graph again, this time colored with modularity community detection. This automatically sorts the users into color-coded groups based on shared interactivity. With this data, we're mostly talking about the difference between servers. However, you can see that communities cross-pollinate with one another. When this happens, users are colored by whatever community they more strongly connect with.

1701889981718.png

You can see here how the couple of blue clusters are primarily branching off from the orange cluster, and how the orange and pink clusters share a lot of common members. This sort of represents how the bot spreads too. "Hey, another server I'm on has this cool bot. We should add it here, I bet it would be useful for us."

1. Tracking degenerate behavior
1701904016268.png

In this view of the network, I've recolored and resized users' nodes based on how highly they score on average in the sexual/minors category of moderation. There are a couple interesting observations here.

Users who tend to talk sexually about minors a lot are on the outskirts of these communities. There are some slightly red dots towards the centers of the big clusters, but generally the pedophiles are on the fringes of the communities that they interact with. This isn't exactly surprising, but it's interesting to see the data confirm this. To me, this is simultaneously both encouraging and discouraging; on the one hand, people who talk really openly about pedophile shit are generally ostracized. On the other hand though, there are a lot of people who are apparently very interested in child sexuality interacting at the cores of a lot of these communities, and they seem to be tolerated by other members of their community as long as it isn't all they talk about.

Most users, as a general rule, are strong members of one or two communities. You'll find that this is true across most interactions online. If you post a lot on Kiwi Farms and the couple of orbiting Telegram groups, you're limited by time and attention and you won't post on other places like Reddit and Twitter quite as much, as an example. This brings me to an assumption, but one that I'll support with some more data in a second. The really strongly pedophilic nodes in the networks are, as best I can guess, pedophiles who are on a fishing expedition. They're outsiders that come into other communities, and though my bot can't see whatever community they more strongly gravitate towards, the assumption that there is a stronger community they're a part of seems obvious to me.

This assumption is backed up by another view of the network. In this instance, I've kept the sizes the same, but now the coloration is based on eigenvector centrality. Essentially, in this view, darker nodes are more active participants, and lighter nodes tend to keep to themselves or a small group of other users. The key observation here is that the strongly pedophilic nodes aren't core members of any of the communities they inhabit, at least the communities I can see.

1701904363381.png

2. Types of pedophiles
I think it's worth stopping to define the terminology here a bit. When I call the nodes with strong sexual/minors scores pedophiles, I think I should emphasize that these people are very openly talking about children sexually. I view these people as especially dangerous to children they may come into contact with when compared to someone like, say, Lionmaker. I absolutely do not want to minimize what Lionmaker did, but I think that what he did is a lot less outright malicious than the type of content we're talking about here. Lionmaker was a lonely retard who made a huge mistake, one that he should absolutely be punished for both legally and socially, but this is a very different type of person than someone who goes into random Discord servers talking about wanting to have sex with kids.
While we're at it, here's an example of the types of content that scored highly lately:

1701905789218.png1701905872590.png1701905892854.png1701905921911.png1701905965035.png1701906004825.png

There's a ridiculous amount of these. I could post pages and pages and pages, but I won't belabor the point. What I mean to stress here is that, when I show those big red nodes above, these are people who are unrepentant child predators, not people who are lonely retards who made a couple bad decisions. When you look at these high scores, that's the kind of stuff you might have to say to end up as a red dot on the network graph.

3. What's being done about this?
Honestly? As far as I can see, nothing. There've been a couple people I've come across in the database that I felt so disgusted by that I actually took the time out to make a report on the FBI's tip website. You'll probably be unsurprised to hear that nothing ever came of these tips. I've reported them to Discord and I never got a response from them either. Maybe they banned the users? Probably not. I view this as an issue on multiple fronts.
At the highest level, law enforcement isn't especially interested in these cases. They're messy, time-consuming, often involve distant jurisdictions or are international, have victims that may not want to cooperate, and have no material incentive in the way that tax fraud or drug cases get returns. If they do catch a pedophile, it's unlikely that there's a way to connect this to other cases and make solving them easier, in the way that flipping a low level drug dealer can lead to higher level drug dealers. Thus, not many resources are applied to rounding up people like this and giving them, I don't know, probably probation or something.
Discord has an incentive to ban these people only when their actions are so egregious that they can't be ignored, and the people I've shown here are vile no doubt, but they don't really rise to the level where someone would write a big news story about them as individuals. If they say "our platform is full of pedophiles, but we're finally taking care of it now" that will just make people who are unfamiliar stop and say "wait, what? Discord is full of pedophiles? Well, I won't be installing that app." Other than that though, there's no financial motivation to purge them, and in fact it'd surely hurt Discord's bottom line to do so!
At the local level, like server moderators, the moderators are either unable to really stop the problem if they actually want to, or they're to some degree complicit. There's always another server with lax moderation or complicit mods if you get banned from one place where kids hang out.

4. How much would it cost to implement a solution like mine across all of Discord?
Well, the OpenAI moderation API is free, but surely that wouldn't be the case if you were running it for all of Discord. OpenAI's most comparable paid product is their text embeddings API, which costs $0.0001 per 1k tokens. I won't get into the details of what tokenization means in this context, but essentially, you get about 4000 characters of text embedded for $0.0001.

Statista shows that as of 2022, Discord processes about 4 billion messages per day. In my own database, the average message length is about 43 characters per message, so I estimate that Discord processes about 172 billion characters of message text per day. If we do the math, that works out to about $4300 per day in embeddings costs, or $1.57 million per year.

I know of cheaper ways to do this, but I'll emphasize here that you can set up the OpenAI moderation API in as little as a couple minutes in less than 5 lines of Python. It would be extremely easy to do it this way, and $1.57 million isn't a crazy price for a service as huge as Discord. Is it worth the money? Well, I'm not a soulless reptilian wearing a human skin suit, so I'd say yeah, definitely. But I guess that's not my choice to make.

5. Closing thoughts
Maybe I'm naive. I always knew that there were pedophiles on the internet because, well, I'm not a complete retard. But before I started this Discord bot project I had no idea the extent to which this is an issue. I'm just one guy running this as a personal project to make a little bit of extra cash and I've already found so many; imagine all the ones I don't see. And it seems to me that, for as bad as this problem is, it'll probably get much worse before it gets any better. Nobody seems very interested in doing anything about these people and I don't see any reason why that'd change anytime soon.

Modern children are exposed to dangers that weren't around when I was a kid. A lot of times I see people express the sentiment online that problems like this aren't really new, that these problems have always existed but now they're more visible. In certain areas of life I think that's true, but I think that the internet opens up avenues of abuse that didn't exist when I was growing up, and I think that at least some of the people taking advantage of the ease of access that they have to unattended children wouldn't have bothered if they had to like, go creep around at a skate park to talk to young girls or something.

I don't really know what the solution to this is. I want to believe that we can be free online, and to me, freedom includes not being spied on by random people (like myself lol) or companies, even if that spying could, in theory anyway, make the world a better place. I often feel that the trade-offs we've made so far haven't been worth it. Say what you will about the Chinese government, but at least they use their spying tools to enforce their laws, even if I'd hate to be under a lot of their laws myself. I get spied on by Google and Facebook and Amazon and Cloudflare and the NSA and the FBI and the CIA and so on and so forth, but to what benefit? They don't round these people up in numbers that I find even close to acceptable. Is all that spying really worth it if all they're going to use my data for is to figure out the best way to trick me into buying a new pair of sneakers?

Happy to answer any questions or show more of the data if anyone's interested.
 

Attachments

  • 1701891986144.png
    1701891986144.png
    800.5 KB · Views: 198
  • 1701892464035.png
    1701892464035.png
    687.6 KB · Views: 236
Last edited:
With such a large bot used in so many servers are there any further connections to the Discord technical team as a standard user?

It's disturbing that your bot can detect such activity yet a company with the scale of Discord can't implement AI or another detection method to pick up these types of messages and flag for review. It almost feels like there should be a developer to discord pipeline for reporting such things or being able to flag up suspicious activity.

On the wider issue on pedos an stuff, as someone who spent their teens hunting pedos and trolling those communities here are my thoughts in respect to your experience and exposure to this:

It isn't specifically a discord issue; this has always been the status quo of the internet that there will be exposure to a far higher concentration of weirdos than one would expect in their normal life. Discord is a very dangerous beast because unlike the services that preceded it there is a combination of direct live communications (dms/vc/video call) and the social media aspect (servers) on top of an incredibly narrowed down registration requirement which makes signing up especially easy.

You briefly touched on it but the one reason discord is especially flippant for the pedo shit unlike Skype and Teamspeak is due to how normalized digital interaction is in this generation of children than even people born in the 90s. Teamspeak wasn't especially popular, and Skype had a lot of issues but was nowhere near as socially parasitic or as popular.

You touched on social media, however the important thing to remember is that Facebook, Twitter, Instagram and Snapchat are the larger storages of illegal porn and the most commonly used means of grooming children.

I don't think we can reasonably expect any solution that doesn't require AI specifically trained to travel people's communications looking for flaggable content (which people don't tend to like the idea of due to the fact people will use it for all speech) or an army of specifically trained administrators to review it. Discord is incredibly slow and outright negligent with responding to reports of abuse of TOS despite the terrible PR over it, so little chance of that.

Another issue is when communications cross borders. Some countries simply don't have any transnational network to pass on information, some just don't care about cyber crimes, and others are in countries with no laws against acts of sexual communication with children or accessing CP.

Ultimately the lessons of the past which seem to have been forgotten now served the best and most powerful way to protect children - supervise your child on the internet. Restrict their access and time allocated to it. Actively protect them and monitor them, because children are absolute retards who will give in to peer pressure regardless of all the life lessons, and the only real way to stop them from becoming victims is to be a part of their life, including their digital life.
 
Dude, don't give out this (genuinely fascinating) work for free. Get in touch with your nearest university that has some kind of combined law and tech programme, work this up as a Masters by thesis or Ph.D proposal, and get this published. You actually have a bundle of data that verifies some very common assumptions about internet grooming. This is valuable and important.
 
I'm kind of curious if this bot and network data could assess one thing I've always been curious about statistically:

What percentage of servers have any sexual content above a certain percent? For example, if the server's main topic is technology but it has sexual themes and discussion coming up in a statistically notable figure. It would be super interesting to have an actual statistic on how sexualised and coomer brained discord is as a whole, in servers not marked for NSFW topics.

Aside from the ethical issues of using data from an unauthorised source due to the developer agreements (I'm not saying it's right that such a thing applies in this instance) I really think you should expand on this and go to a media organisation or some kind of entity with some power to push this out to a wider audience. I'm pretty sure Whistleblowing status would apply to avoid any potential legal ramifications.
 
I love the fact that pedos are so stupid they use an app that doesn't use end-to-end encryption. Good work, OP.
It's less that they don't know better, and more that they get some sort of sexual enjoyment out of telling people how fucked they are, and knowing they can't do anything about it other than report their account to Discord who will obviously do nothing.
 
Good work and information OP.
I've gotten a niece to stop playing Roblox cause I told her about the bad guys on it. She still games, but sticks to Fortnite at least.
Don't think she uses Discord much, but I've told her mom about some of the stuff to watch out for and to make sure to listen to her if she happens across anything fishy.
If only I could get the kid to stop using tikbotnet
 
I'm kind of curious if this bot and network data could assess one thing I've always been curious about statistically:

What percentage of servers have any sexual content above a certain percent? For example, if the server's main topic is technology but it has sexual themes and discussion coming up in a statistically notable figure. It would be super interesting to have an actual statistic on how sexualised and coomer brained discord is as a whole, in servers not marked for NSFW topics.

Aside from the ethical issues of using data from an unauthorised source due to the developer agreements (I'm not saying it's right that such a thing applies in this instance) I really think you should expand on this and go to a media organisation or some kind of entity with some power to push this out to a wider audience. I'm pretty sure Whistleblowing status would apply to avoid any potential legal ramifications.
1701926135731.png

Here's the same network graph that I posted in the OP, except this time the nodes are emphasized based on all sexual content, not just sexual content featuring minors. At a glance, how prevalent sexual content is in general seems to be more a function of local moderation than of the topic of the server. There's very few spots on the graph that don't have red dots, and since my bot isn't in any explicitly pornographic servers, that indicates to me that the places where it's missing are probably just places that don't tolerate it very much.
Kinda looks like a petri dish lol

With such a large bot used in so many servers are there any further connections to the Discord technical team as a standard user?

It's disturbing that your bot can detect such activity yet a company with the scale of Discord can't implement AI or another detection method to pick up these types of messages and flag for review. It almost feels like there should be a developer to discord pipeline for reporting such things or being able to flag up suspicious activity.
Once you pass 100 servers, you have to do ID verification and all this other shit to prove that, I dunno, you're not making a bot with fake users to distribute malware or something. I had to go through it and it was (in contrast to everything else Discord does pertaining to user security) surprisingly thorough. I have an email contact that I typically get a bit faster reply on and I've reported a number of things to them.
So far, they've been relatively quick to help, definitely faster than I'm used to with faceless app companies, but only in regards to technical stuff. Like if I have an issue where their API documentation isn't clear, they'll usually reply in a day or so. If I send anything to their safety team though, the most I get back is an email that amounts to "yeah yeah, we'll look into it."
My bot is decently large in the grand scheme of all bots, because most bots end up on a handful of servers and never scale past that, but it's really small compared to the bots Discord actually cares about. The impression I get is that how receptive they are to helping you with issues is very directly correlated with how big your bot grows. For example, when Midjourney needed new features to make their interface on Discord better, Discord made changes to the API just for them, features that no other bot devs can use, so that their bot could continue working smoothly for even the most brain-melted retards on the platform. That was a particularly egregious example but it encapsulates how they operate overall very well.

I don't wanna make a big ass wall of text thanking everyone individually for the nice comments but I read them all and I'm really glad you guys enjoyed this post. I'm going to continue monitoring the data over time and I'll post updates periodically as the bot grows. It's just now getting to the point where the growth curve is starting to look exponential, and I think it's really going to take off over the next few weeks or months. Right now the data is very interesting to me but I think it still represents too small of a subset of all users of Discord for it to be taken seriously academically or by law enforcement. As it grows I remain hopeful that that'll change though and that, by publishing info like this, hopefully some small positive change will come in the world :)
 
It's optimistic in the extreme to think taking this to the media would get the slightest bit of traction. Remember, the press helped Netflix scold everybody who found Cuties disgusting. It's also pretty unlikely any university is going to be open to this kind of research because they've spent decades working to desensitize people to grooming and child abuse; stuff like this could lead to lots of heads slowly turning back towards the universities and mobs forming to collect some faculty scalps.

This is still fantastic shit, but expecting anything mainstream-adjacent to do anything besides try to bury this is folly.
 
I love the fact that pedos are so stupid they use an app that doesn't use end-to-end encryption. Good work, OP.
They dont care because there arent much consenquences to them they are so pervasive, and sometimes protected by companies like roblox because they are the biggest customer. If we actually punished pedofilia on the internet and not just some witch hunts that peter out like gigglygoonclown half of this shit wouldnt be happening

TL:DR the companies dont care , dont let your kids on discord or on the internet without supervision .
 
I love the fact that pedos are so stupid they use an app that doesn't use end-to-end encryption. Good work, OP.
Fun fact about the psychological profile of most genuine pedophiles: they frequently show exaggerated adolescent traits, ranging from impulsiveness, to lower IQs, to emotional volatility.

Some theorize that this is why is non-sadistic/psychopathic pedophiles act that way towards children (sadistic ones get off on the power, pain, and taboo), because part of their brains are fucked and stuck in a developmental stage it should not be in. That doesn't mean they deserve sympathy, it does mean we could develop methods to neurologically identify them without penile reaction tests.
 
Back