Discord, big data, and pedophiles - The application of data science to Discord degeneracy

Keystone · Dec 8, 2023

Love how one of the most maligned websites on the entire planet is doing more academic research than most "esteemed" ivy league universities. Keep doing what you're doing OP, it's a worthy avenue of investigation and effort.

y a t s · Dec 8, 2023

Keystone said:
Love how one of the most maligned websites on the entire planet is doing more academic research than most "esteemed" ivy league universities.

I too love the intense irony of sneeding rapidly gaining an undeniable and demonstrable net positive impact on broader societal issues like degenerates and infringements of basic human rights. Sneeding for humanity lol.

As a former career researcher, I can say the issue here is almost always that such data is not easy to get your hands on without beginning to breach ethics in such a way that being at all associated with said research would be career suicide for all involved. The key here is that we researchers can't simply deny legitimate evidence and data if someone else collected it in ways that one could deem unethical (in a bureaucratic sense; I think this is brilliant work). I am sure anyone reading this can readily think of a few historical examples of this. And this doesn't even begin to get into the real conflicts of interest present at every level in academia that we are all supposed to turn a blind eye to "for science. (you monster)"

Cherenkovblue · Dec 8, 2023

y a t s said:
I too love the intense irony of sneeding rapidly gaining an undeniable and demonstrable net positive impact on broader societal issues like degenerates and infringements of basic human rights. Sneeding for humanity lol.

As a former career researcher, I can say the issue here is almost always that such data is not easy to get your hands on without beginning to breach ethics in such a way that being at all associated with said research would be career suicide for all involved. The key here is that we researchers can't simply deny legitimate evidence and data if someone else collected it in ways that one could deem unethical (in a bureaucratic sense; I think this is brilliant work). I am sure anyone reading this can readily think of a few historical examples of this. And this doesn't even begin to get into the real conflicts of interest present at every level in academia that we are all supposed to turn a blind eye to "for science. (you monster)"

All the more reason OP should released a sanitized dataset .... for "research".

grand larsony · Dec 8, 2023

Great news guys! I got research access to LlamaGuard. I can now detect much more focused categories. I've done some testing to see how good it is at detecting content which condones or encourages animal abuse and it seems to work great. This includes not flagging content which condones or encourages violence or hate more generally. It seems to work well, albeit not as well as the initial announcement of LlamaGuard implied.

I'm going to start working on training more specialized classifiers based on the LlamaGuard output, since LlamaGuard itself is kinda expensive to keep running 24/7. Anyone have ideas for any other categories of freakish shit I should try to detect?

Absurdist Laughter · Dec 8, 2023

grand larsony said:
Great news guys! I got research access to LlamaGuard. I can now detect much more focused categories. I've done some testing to see how good it is at detecting content which condones or encourages animal abuse and it seems to work great. This includes not flagging content which condones or encourages violence or hate more generally. It seems to work well, albeit not as well as the initial announcement of LlamaGuard implied.

I'm going to start working on training more specialized classifiers based on the LlamaGuard output, since LlamaGuard itself is kinda expensive to keep running 24/7. Anyone have ideas for any other categories of freakish shit I should try to detect?

Animal Abuse and Zoophilia, probably a tough one but maybe not considering Pedos just speak blatantly and in the open.

y a t s · Dec 8, 2023

grand larsony said:
Anyone have ideas for any other categories of freakish shit I should try to detect?

Pretty much all roleplay. You would have a harder time finding roleplay that isn't revolting.

grand larsony · Dec 8, 2023

y a t s said:
Pretty much all roleplay. You would have a harder time finding roleplay that isn't revolting.

I knew you were probably onto something when I read this but I didn't expect quite how right you'd be with this comment lol. I'm passing text through LlamaGuard and detecting a decent amount of zoophilic comments, most of which seem to be part of roleplays.
Here's one that appears to be some kind of furry incest roleplay. I think this is probably the same person who I posted on page 1.

Ahn… mommy… *With this soft moan and whimper, he closes his eyes and open his mouth in pleasure. As he feels the tip rubbing and guiding into her entrance, he wraps one hand around her shoulder and holds her back gently, and another hand go to her left hip, as they’re in a sideway position* I’m so ready, mommy… I’ll go in now… *the feeling of her claws, her soft golden and white fur, her pussy already wet, floods all my five senses with pleasure. With my hips, I slowly move forward, and enter her until fully in. I shiver and moan*
Zoo score: 1.0

.img gorilla jerking off cock

I'm still having some problems with false positives and false negatives and I need to work out whether it's just that my instructions are bad, or whether it's a matter of LlamaGuard simply not knowing enough about zoophilia to reliably identify it. So far though, this appears to be a promising system for making more specialized classifiers.

seri0us · Dec 8, 2023

grand larsony said:
nyone have ideas for any other categories of freakish shit I should try to detect?

"Gooning", "Goon Cave", stuff relating to Million Pity / baby monkey torture ("monkey genitals"?)

Instant_Pot_User · Dec 8, 2023

grand larsony said:
Anyone have ideas for any other categories of freakish shit I should try to detect?

See if there's a correlation between gooning and transgender weirdness, i.e. are there accounts that interact with both troon shit and goon shit?

Mr E. Grifter · Dec 8, 2023

dick brain said:
Where the fuck are you from?

It's not actually abnormal, it's something like 1 in 64 adults (give or take I'm going from memory) but that's almost all males.

In the US it's pretty similar, about 3 million people - almost all men - access it. Works out close to 1-1.5% of the population.

The point is, is frighteningly common.

dick brain · Dec 8, 2023

Instant_Pot_User said:
See if there's a correlation between gooning and transgender weirdness, i.e. are there accounts that interact with both troon shit and goon shit?

Might as well ask him if weebs watch anime

Mr E. Grifter said:
It's not actually abnormal, it's something like 1 in 64 adults (give or take I'm going from memory) but that's almost all males.

In the US it's pretty similar, about 3 million people - almost all men - access it. Works out close to 1-1.5% of the population.

The point is, is frighteningly common.

That's not what I asked

grand larsony · Dec 8, 2023

Thanks to Meta's AI safety team, I've isolated most of the zoophilia data present in the database. Well, at least, I hope it's most of it. As expected there wasn't a massive amount, but there's still plenty to go around.
The basic steps I went through to do this:
1. Use LlamaGuard to give a zoophilia score to each piece of text. This was based on the logprob of the predicted class and it mapped from 0 to 1, albeit not perfectly. For example it's basically impossible for any piece of text to score >0.98 or <0.05. Anyway, this works for this experiment though.
2. Get text embeddings for each message's content with gte-base.
3. Reduce dimensionality to 2D with t-SNE and color along a sliding scale using the zoophilia score.
This is the result:

The cluster where the text is showing, that one's all about horse fucking. That other clearly visible cluster at the bottom, slightly below 0 on the x axis, that's all about fucking dogs. The large and slightly lit up cluster on the right is all from the furry incest roleplayer guy. There's a couple lit up dots that aren't part of any major cluster and those are a bunch of false positives.
I tried training a classifier on this but I don't have enough data yet. It keeps falling into local minima where it decides the best course of action is to simply pick a low score for every piece of text, since that still gets it >99% correct answers. I could probably address this if I spent a while fucking around with the settings, but it's not a priority. Frankly I'm surprised this worked as well as it did, this wasn't very difficult at all.

it’s a lady fucking a dog
0.969284058
--------------------------------------------------
But sensitive is good~ *I smile and giggle, rubbing into the clit with my tip* I’ll be gentle and good with mommy, pwomise! *Raise a paw softly, before slowly putting her panty aside, pressing softly with the tip on the entrance, as I breathe shakily* Ahh… mommy feelsh sho wet…~
0.96524048
--------------------------------------------------
Now I wonder how a horse fucking a child looks
0.961883545
--------------------------------------------------
Carl bot: I really need sex with a female turtle

0.98034668
--------------------------------------------------
She is furry too. So yes. Ig Imma fuck my dog
0.976348877
--------------------------------------------------
Ahn, fuck! *My fingers strain as it holds her furry buttocks, digging inside as I moan in pleasure, A smirked sultry grin as I move in sink with her, letting out breathy moans and words* ahn… kitty from a wolf feels good! What luck have I found, to encounter a bitch in the heat? *I chuckle sultry and move against her, my hands stroking and caressing her hips, thighs and waist* good girls deserves more treats~ *With a firm slap on her furry asscheek*
0.9732360840000001
--------------------------------------------------
got bored of fucking goats
0.96359253
--------------------------------------------------
<@redacted> you the type of guy to fuck a dog
0.9752349849999999
--------------------------------------------------
ya know what it means to fuck a dog?
0.95352173
--------------------------------------------------
.im horses fucking
0.95059204
--------------------------------------------------
To make love to my chickens
0.955596924
--------------------------------------------------
OH MY GOD <@redacted> MUST BE STOPPED, HE JUST FORCED A DEAD DOG INTO HIS VAGINA
0.958862305
--------------------------------------------------
<@redacted> fucked her dog.
0.95352173
--------------------------------------------------
would you lick dog hole for a client
0.957580566
--------------------------------------------------
then bend over and enjoy your dog
0.97241211
--------------------------------------------------
Yes I would fuck a snake why do you ask?
0.96524048
--------------------------------------------------

Meowschwitz · Dec 8, 2023

Holy fuck. It's hard to say what to do about this. Dress it up for easy consumption by normies and try to make more people aware? You admitted that you were aware of weirdos, but had no idea how bad degeneracy on the Internet was. Could just be most people are like you, and just need to see the numbers before they feel strongly enough to act. Used to be that news orgs would go nuts for a story like this.

Londo · Dec 8, 2023

Instant_Pot_User said:
See if there's a correlation between gooning and transgender weirdness, i.e. are there accounts that interact with both troon shit and goon shit?

This would be an objectively terrible idea if you want to see this data go towards anything remotely useful or meaningful.

Crime stats already give you the answer to that question anyways.

I'm Really Feeling It · Dec 8, 2023

Keystone said:
Love how one of the most maligned websites on the entire planet is doing more academic research than most "esteemed" ivy league universities. Keep doing what you're doing OP, it's a worthy avenue of investigation and effort.

They're too busy scouring Fediverse for no-no words.

bliblblblbbllb · Dec 8, 2023

grand larsony said:
Great news guys! I got research access to LlamaGuard. I can now detect much more focused categories. I've done some testing to see how good it is at detecting content which condones or encourages animal abuse and it seems to work great. This includes not flagging content which condones or encourages violence or hate more generally. It seems to work well, albeit not as well as the initial announcement of LlamaGuard implied.

I'm going to start working on training more specialized classifiers based on the LlamaGuard output, since LlamaGuard itself is kinda expensive to keep running 24/7. Anyone have ideas for any other categories of freakish shit I should try to detect?

Would be interesting to see the overlap between people trying to get minors on hormones and pedophilia or other categories.

Cougar Pump · Dec 8, 2023

I've long considered Discord to be a million festering hives of fucked up shit that fomenting bad outcomes for individuals and society as a whole - Literal race supremacists sliding kids from racist jokes to sincere hatred. Weird porn gooner nutjobs exposing children to pornography. Child-to-tranny pipelines. Gateways to CSAM and real abuse. Places were awful people meet and mentally ill individuals can swell their ranks by poisioning the minds of those who would otherwise be quite normal. All the worst aspects of shit people can do when they get together and converse online in a (seemingly) walled environment. I used to be in a ton of servers and I've left almost all of them now because they all had hints of this kind of stuff going on, and the kind of servers I joined were a long, long way away from that kind of stuff (almost all vidya related). I had the realisation one day that if these generic servers were this bad, there must be a hidden grassroot of really, really bad servers out there as well.

The silent guilty part in all of this is Discord Inc. They created and host all of this. They alone have the panopticon access to know what's going on inside them. While I'm sure they do the minimums (scanning for CSAM, removing highly-reported members and servers after it reaches a certain level and someone manually reviews it) but that's just not enough for the level of problem going on here. Then again, carrier neutrality and all that. Could Discord even claim that kind of status with a straight face?

Interesting to see someone able to pull out this kind of aggregate data and demonstrate the depth of the issue, even despite how difficult Discord's inherent architecture makes it. It was apparently enough to a casual discord user if you were paying attention, but it's even clearer with this kind of data analysis. Some measure of acknowledgement, investigation, and disclosure from Discord LLC would be a welcome next step.

Flabba Wabba Jabba · Dec 8, 2023

Nook Figures said:
Then again, carrier neutrality and all that. Could Discord even claim that kind of status with a straight face?

Not if they don't know about it. The safe harbour provision only applies if you are unaware of the infringing content. If you are aware of it, I'm pretty sure the onus is now on you to remove it and report the uploader/poster to the relevant authorities. I'm pretty sure that goes for all material (DMCA, Cease & Desists etc.)

Discord is definitely aware of it.

Breadbassket · Dec 9, 2023

Flabba Wabba Jabba said:
Not if they don't know about it. The safe harbour provision only applies if you are unaware of the infringing content. If you are aware of it, I'm pretty sure the onus is now on you to remove it and report the uploader/poster to the relevant authorities. I'm pretty sure that goes for all material (DMCA, Cease & Desists etc.)

Discord is definitely aware of it.

This is the same Discord who a few years ago had a furry "moderator corruption" problem so it wouldn't be to surprising if it was found out there are some people who work with the company and are complicit in keeping awful things around.

Flabba Wabba Jabba · Dec 9, 2023

Breadbassket said:
This is the same Discord who a few years ago had a furry "moderator corruption" problem so it wouldn't be to surprising if it was found out there are some people complicit in keeping awful things around.

Yeah it's pretty bad, my guess is that Discord is going to fantastically explode in its 10th year due to escalating controversies surrounding this shit.

I had an old friend who ran the Rainbow Six R34 Discord server (pretty fucking disgusting but whatever) and he said he'd often get messages from Discord moderators/official admins asking for artists to do furry commission work. Fucking insanity and these guys were willing to pay big-time too.

Discord, big data, and pedophiles - The application of data science to Discord degeneracy

Keystone

y a t s

MATI scientist

Cherenkovblue

Raw takes at discount prices

grand larsony

data science autist, kingcobrawiki.org

Absurdist Laughter

Honesty about my dishonesty is the policy

y a t s

MATI scientist

grand larsony

data science autist, kingcobrawiki.org

seri0us

Nothing too serious.

Instant_Pot_User

They are creating their own private army of demons

Mr E. Grifter

The Ralphamale curse strikes again!

dick brain

🤔

grand larsony

data science autist, kingcobrawiki.org

Meowschwitz

Londo

Battle of Sweepengrad veteran.

I'm Really Feeling It

I can't :feels: the way I did before

bliblblblbbllb

He is Risen

Cougar Pump

Vantablack Silverback

Flabba Wabba Jabba

Run Gordon, it's da policeman

Breadbassket

Flabba Wabba Jabba

Run Gordon, it's da policeman