This is harder to answer since the moderation API covers more broad categories. As the bot grows I plan to make purpose-built classifiers for more fine categories. For example, I'd like to separate the "hate" category into racism, sexism, transphobia, separate the sexual category into normal sexuality, fetish content, extreme fetish content, etc. But to do that I need more training data. 2 million might sound like a lot of messages, but the vast majority are people saying totally normal things even with as many disgusting freaks as there are in the database, so the positive examples I have of specific forms of inappropriate content are somewhat hard to come by at this stage.
If you're curious, here's the list of categories I currently scan for: hate, hate_threatening, violence, violence_graphic, sexual, sexual_minors, harassment, harassment_threatening, self-harm intention, self-harm instruction, insult, obscenity, threat, identity_attack. There's some overlap between categories but the points where they don't overlap allow moderators to make some nice fine-tuned choices about what they'd like to allow. E.g. setting a strong obscenity filter but a weak insult filter, so that "you're so dumb" wouldn't be blocked but "you're a fucking retard" would.
Good news though - Meta just released a new classification tool that's purpose built for matching against user-defined categories. This is something I plan to add to the bot as soon as I get approved for research access. This will make it much easier to scan and track a much wider variety of content categories. Facebook spyware link, click at your own risk lol -
https://ai.meta.com/llama/purple-llama/
(I mentioned the hate category here, but before anyone jumps down my throat about it, I think that yes, free speech is critical to a free society, but there's also plenty of times and places where certain things are obviously unwelcome. Kiwi Farms is a great place to call people niggers and trannies, but the local PTA meeting probably isn't, that's my point.)