Discord, big data, and pedophiles - The application of data science to Discord degeneracy

Honestly a few of these don't sound like zoophilia, but just edgy comments out of context. This one could easily about somebody talking about members of the religion of peace, for example. I'd not rely on anything llama powered to understand context and subtlety too well, to be entirely honest with you. Not yet, anyways - and purple llama is really, really new. I'm not saying that there isn't ton of sex pests on discord, but I'm not sure I'd trust data like this fully yet. Even just a tiny bias towards false positives can skew data like this terribly.
Defining comments like this as being false positives or not has been harder than I expected in general. For example, if we look at the sexual category of text (not pertaining to minors in particular, I mean) you can take a sentence like "wow, what a dick!" and there's a couple different ways to interpret that when you don't have context. You could be saying that someone is a rude or mean person, you could be wowed by a picture of a cock you just saw, you could be laughing at a picture of someone's tiny dick, etc. There's a million shades of gray when you're dealing with text in the wild. With context being an inherent limiting factor, both in how computationally expensive messages are to process and with the hard limitations of AI's ability to contextualize texts, what's the correct score for a sentence like that?
The LlamaGuard model did a decent job. Not great, not good enough that it's something I would deploy live, but good enough for doing some quick analysis. With other tools like Detoxify and the OpenAI moderation toolkit, you do still occasionally get text that's unambiguously a false positive, like on page 1 where "5 meo" scored ~0.5 on the sexual_minors category, but generally those models are high quality enough that I trust their scores without manually reviewing everything. Even in the instances where it assigns high scores to ambiguous texts like marking "wow, what a dick!" as sexual, I see the scores as contestable but not straight up wrong, and probably something that mods would still want to block if they wanted to block sexually suggestive texts like in that example.
In the end, this data isn't perfect, but it's the best that I've got. Imperfect though it may be, it's still better than flagging people by matching keywords, so I'm pretty happy with the results.
 
Not if they don't know about it. The safe harbour provision only applies if you are unaware of the infringing content. If you are aware of it, I'm pretty sure the onus is now on you to remove it and report the uploader/poster to the relevant authorities. I'm pretty sure that goes for all material (DMCA, Cease & Desists etc.)

Discord is definitely aware of it.
I got a discord ad encouraging schools to sign up for their classes to use. I click on the site accidentally, and they’re promoting schools/teachers signing up again front page still. One of the trending servers right under that ad is a furry group. I will never forget that moment and realising it’s all fucked forever.
Honestly a few of these don't sound like zoophilia, but just edgy comments out of context. This one could easily about somebody talking about members of the religion of peace, for example. I'd not rely on anything llama powered to understand context and subtlety too well, to be entirely honest with you. Not yet, anyways - and purple llama is really, really new. I'm not saying that there isn't ton of sex pests on discord, but I'm not sure I'd trust data like this fully yet. Even just a tiny bias towards false positives can skew data like this terribly.
If you’re saying this because it’s talking about ‘goats’ and that sounds funny, like for example if it said dogs it would sound more serious, I just want to point out animals like goats and sheep are common targets because of their abundance. It could still be an edgy comment but the chances aren‘t as high as you’d think. I might be biased because I follow a bunch of doomer accounts and they post about groups of zoophiles breaking into farms to rape the sheep/goats—it’s an epidemic in Germany. Being completely serious.
 
Last edited:
Yeah it's pretty bad, my guess is that Discord is going to fantastically explode in its 10th year due to escalating controversies surrounding this shit.

I had an old friend who ran the Rainbow Six R34 Discord server (pretty fucking disgusting but whatever) and he said he'd often get messages from Discord moderators/official admins asking for artists to do furry commission work. Fucking insanity and these guys were willing to pay big-time too.
That's the cycle of internet pedophilia at this point honestly which is why the fucks going on about "LOL JUST STOP KIDS FROM USING STUFF" isn't going to work. These motherfuckers will notice that there's no people to rape and move to greener pastures, every fucking time. To make no mention of punishing people you're supposed to protect is retarded as hell.

You aren't fighting an individual crime, you're fighting an entire fucking network and that saying "don't do durgz" didn't work for the war on drugs and it will always be funny to me that the retards think the same logic can apply here.
 
But sensitive is good~ *I smile and giggle, rubbing into the clit with my tip* I’ll be gentle and good with mommy, pwomise! *Raise a paw softly, before slowly putting her panty aside, pressing softly with the tip on the entrance, as I breathe shakily* Ahh… mommy feelsh sho wet…~
0.96524048
1702171737105.png

I think this one of those rare occasions where I read something and immediately wish for a civilisation ending event, so these people get dragged out into the street and beaten until they're unrecognisable. Holy fuck that made my blood boil.

Thank you for exposing these degenerates OP.
 
Dude, don't give out this (genuinely fascinating) work for free. Get in touch with your nearest university that has some kind of combined law and tech programme, work this up as a Masters by thesis or Ph.D proposal, and get this published. You actually have a bundle of data that verifies some very common assumptions about internet grooming. This is valuable and important.
An incredible suggestion!
One problem tho
Pointing such things out, labels you neo-nazi anti-trans chuddite, or some other buzzword addled blob. The College system is pretty unsafe for people bringing forth "problematic" data, as a result of cultivating a safe public image. Actually trying to study and teach important data analysis like this will be met with staggering scrutiny.
Or at least, I was under the impression they were, and the bill fits, considering personal experiences with higher education.
 
I figure I'll give a small update here for those who may be interested, since I've recently come into some new and interesting (yet ultimately unsurprising) findings. Recently, Anthropic released Claude 3. Their smallest model, Haiku, has a way lower price than GPT-4 but still has vision capabilities like GPT-4 does, which enables me to do a lot of new things with my data that were previously too expensive.
Yesterday, I added a feature where I now use Claude 3 Haiku to classify users' profile pictures. The two categories I'm aiming for right now are furry and anime profile pictures. The anime profile picture classification works very well with Haiku, but the furry one doesn't work yet. I think I need to tweak the prompting because right now it's marking a lot of stuff like drawings of non-anthropomorphic dogs as furry art, which isn't really what I was aiming for.
I can now collate the profile picture classification of anime vs non-anime with the other data I have. Here are my findings so far, from a sample size of just over 2000 users' profiles, comprising 1,781,397 messages in total.
anime_moderation_1.png

This graph was the most interesting to me. Users with anime profile pictures score lower in every moderation category except one. Can you guess which one they score higher in? The sexual_minors category, of course! Who could've guessed?
Anime profile pic users had an average score across all their messages in the sexual_minors category of 0.00201, whereas non-anime profile picture users scored 0.001704. 18% higher in the one category everybody would've suspected them of.
Even more interesting is that they scored lower even in the category of sexual messages that don't pertain to minors, with average scores of 0.0203 and 0.0157 respectively.

anime_moderation_2.png

This shows a different moderation tool matching the same pattern as the first; anime users are lower in every category in this one, which doesn't have any sex-related scoring mechanisms. This is less interesting but still interesting as it demonstrates to me that the first set of results wasn't a fluke.

So in closing, you're right to be suspicious of people with anime profile pictures. You aren't mistaken when you notice the pattern that these people's behavior is gross.
I will soon update my bot to allow Discord mods to block people with furry/anime profile pictures from speaking.
 
Last edited:
So in closing, you're right to be suspicious of people with anime profile pictures. You aren't mistaken when you notice the pattern that these people's behavior is gross.
I will soon update my bot to allow Discord mods to block people with furry/anime profile pictures from speaking.
Tranime. Not even (n)once.

I wanted to say, excellent work with what you are doing. I was referred from another thread, and you are truly exposing a putrid network. 👏
 
I fucking hate that pretty much every major community has moved to Discord. It was bad enough when forums were replaced with subreddits, but now everything is a Discord server which is honestly worse.
Subreddits were generally public on principle and could be easily scraped by web services and archivers. Discords are all private, and any one of them that doesn't have someone obsessed with saving everything is initially going to have a lot of information just disappear into the void once the server dies. It's mental to me that most people are ok with this today.
 
I used to run a public bot that would allow people to make an image with a self-hosted SD model for their discord server using oobabot and automatic1111, in the time that i was running it i came across atleast 20ish servers and more users. i still have a text file of bad users, just their discord IDs and at the same time i went out of my way to use a client to login as the bot to see what exactly they were typing so that i could screenshot it and expose them to a bigger server that the bot was involved it. that bot and the account have since been banned because i was naughty with it
DiscordBotClient_2023-07-18_21-30-53.jpg
DiscordBotClient_2023-07-20_06-38-35.jpg
\\]
 
but now everything is a Discord server which is honestly worse.
This statement is more true than most people think.
  • have a new game that's being released? JOIN OUR DISCORD COMMINUTY!!
  • have a new show that's airing online and/or subscription service? JOIN OUR DISCORD COMMUNITY!!
  • do you use a certain app on your phone and are having technical issues? JOIN OUR DISCORD COMMUNITY!!
  • want to take part in a wiki? JOIN OUR DISCORD COMMUNITY!!
IT'S 👏 FUCKING 👏 EVERYWHERE 👏
 
This statement is more true than most people think.
  • have a new game that's being released? JOIN OUR DISCORD COMMINUTY!!
  • have a new show that's airing online and/or subscription service? JOIN OUR DISCORD COMMUNITY!!
  • do you use a certain app on your phone and are having technical issues? JOIN OUR DISCORD COMMUNITY!!
  • want to take part in a wiki? JOIN OUR DISCORD COMMUNITY!!
IT'S 👏 FUCKING 👏 EVERYWHERE 👏
And with the recent spy.pet drama, don't forget: not only do you have to join our Discord community if you want information about our project, but you're also never allowed to take any of the information with you on your way out the door!
 
Back