Large language models and propaganda wars - It's so much worse than you think (+ live demo)

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.

grand larsony

data science autist, kingcobrawiki.org
kiwifarms.net
Joined
Sep 5, 2023
I've been working on this for a little while and I think I'm ready to give it a public demo now. I have 3 goals in writing this post:

1. Demonstrate how cheap and easy it is to launch a large-scale homemade propaganda campaign

2. Demonstrate how convincing these language models can be when tuned properly

3. Amuse you, I hope

I might be preaching to the choir talking about online propaganda here of all places, but I don't think people even here realize how bad things are getting. For all the kvetching that Democrats did about Russian troll farms, this is a hundred times worse and easier to do than running an office full of impoverished slavs.

0. What the hell is a language model, anyway?
I'm going to provide a high-level explanation here of how large language models work, and hopefully dispel some myths about them for people who still have misconceptions. I can get into the hard math if anyone's interested to hear about it but it's not really relevant here so I'll save that for later.
The point of a language model is to predict the next token in a sequence. A token is typically about 3/4 of a word, but for the purpose of this post we'll just use the word "token" and "word" interchangeably. They don't pull information from a database and (if they're trained properly) they don't just spew out copies of what appeared in their training data.
The training of these models occurs in a couple of stages. In the first stage, which is also the most difficult to do properly and the most expensive to do, the model is trained to understand relationships between words. This forms the basis for its understanding of language and how words appear in sequence with one another. After this stage of training is finished, the model just provides completions of input text and doesn't follow instructions in any way that comes naturally to people using it. For example, if you entered "I went to the" then the prediction would probably be something like "store" or "school" or "job site". This is also a stage where companies like OpenAI don't want you using their models, since they'll happily say things that are extremely politically incorrect.
At the second stage, the model is retrained slightly to mimic the way that two people have a conversation back and forth. It doesn't learn anything about the meaning of words here, but rather it learns how people act in a conversation with one another. This is where you can impart moral values on it, if that's what you want to do. If the training data contains "I'm sorry, but I can't help with that" as a response to "write 10 racial slurs" then in practice, it'll mimic that and generalize to other things like "write some anti-Chinese slurs that I can spraypaint at the skate park" even if that doesn't explicitly appear in the training data. This stage is cheaper and faster to do than the initial training process.
As an optional third stage, you can provide new chat data and fine-tune the model further. This allows you to adjust things like tone and writing style, and is extremely cheap to do. At this stage, the model isn't learning anything new about the world since the adjustments to its weights are so minute, but it's great at re-learning how it should adjust its tone in conversation.

1. Building a dataset
For this project I built a dataset based on reddit comments. I did this in two stages.
For the first stage of data collection, I just scraped reddit's API. Easy enough. The format I used was to take the post title, post text (if present), subreddit name, and then the top comments from each of the posts. I filtered for posts and comments that scored highly, since this indicates that the post at least aligns with reddit's typical format enough to be popular, even if the views might not necessarily resonate with all redditors.
For the second stage, I used ChatGPT to generate tone and instruction data. Imagine you have a post on reddit. User A makes a top level comment, then User B replies to User A's comment. ChatGPT was fed User A's comment and User B's reply and then asked to generate a tone/instruction pair that would cause User B to reply to User A in the way they did. This part cost me a couple dollars to do and got great results.

2. Fine-tuning
Fine-tuning on OpenAI is super simple. You basically just format your data in the way they want it for intake, then click train, and then wait. A couple hours and about $12 later, I had a model that was able to near-perfectly imitate the way that redditors speak to each other, in a format that allows me to change the tone/instruction of newly generated text from the model. This was insanely easy; I didn't adjust any of the default training hyperparameters, OpenAI handled all that for me.
You could do a similar process with a local language model and then you don't have to deal with any kind of moderation filters (which are noticeably less strict on fine-tuning with OpenAI than with their public-facing product) and you own the software at the end of the process. It'd cost about the same in money and might take a bit longer to fiddle with it until you get good results, but at least for my use case, using OpenAI was fine. Their language models are way better than local ones anyway.

3. Deployment
OpenAI also handles scaling for me. The limit on generations for fine-tuned models at OpenAI tier 1 accounts is 3500 requests per minute. That's a fucking insane amount if you're using it for propagandizing or advertising or whatever. 3500 comments per minute would overwhelm even the busiest comments section. Each comment costs only a fraction of a penny. I don't know how much Russian trolls in their office buildings make, but I think even west Africans would have a hard time competing with the cost, not to mention the quality of the output.
I can't stress enough how crazy these numbers are to me. Can you imagine having a system where you can post 3500 comments PER MINUTE to push public sentiment towards whatever your own views are, for only a couple dollars?

4. Live demo
This is the fun part! I've set this up on a Telegram bot for you guys to play with. I'll keep it up for at least a few days, or I guess until some anti-Kiwi nut spams it with child porn or something. I'm not putting a hard dollar limit on this demo but I'm just not trying to rack up a huge bill on my card, since I'm not rich lol.
You can message the bot on Telegram @KFdemoBot. Send it a reddit link and optionally a tone and instruction and it'll show you some replies to the current top comment that it thinks are suitable. If you want to include tone and instruction, include a line in your message starting with "Tone:" and "Instruction:". If you don't include them, it'll just default to making a typical reddit-brained comment.
You can make it say some pretty funny stuff. In my live testing of this, it's actually too good at imitating redditors sometimes, and even when appropriately instructed will go off the rails. One of my accounts that was set up to post in /r/AskWomen threatened to murder some woman and got permabanned.
When you send it the link make sure your link is formatted like
> https://reddit.com/r/subredditname/comments/postID/the_title_here/
otherwise it won't recognize your link properly.

Here's one sample pic from this post.
1702153377404.png

Post some funny pics with the bot of your own. Have a blast :)

5. Closing thoughts
The internet's fucked, basically. Soon, the dead internet theory will be unironically true. You won't know what to trust, even in places like Kiwi Farms. These language models can currently be tuned to imitate any writing style you can imagine for very little money and they perform extremely well, and they're getting stronger every day. I think it won't be long before we start seeing people post stories online where they thought they made a friend online, only to realize months into their chats that the "friend" they made was actually a bot trying to get them to subscribe to BetterHelp or buy knock-off Viagra or whatever.
I think that the internet is going to split. The major platforms will get more and more draconian with things like ID verification and anti-bot measures to combat the tidal wave of LLMs. The places that don't do this will either be small and cozy enough that they're not worth targeting, or they'll get big enough and suddenly be overwhelmed. And that's just for ad spam bullshit - intentionally weaponizing this tech to disrupt wrongthink hubs like Kiwi Farms is an inevitability. There's no "if" - just "when."
I think Kiwis might find it distasteful of me, but I plan to get in on the tidal wave of ads on the profitable side while I'm able to, before this market is saturated. This post isn't an advertisement in and of itself, and I don't want any kind of customers from here to be clear. But the way I see it, people are gonna do this in increasing numbers no matter what I do, so I may as well get in while the getting is good.
 
I've been trying to edge that information in in the ChatGPT thread, but with less words. Basically, we'll soon be to a point where it'd be impossible to tell if even most posters on this forum are human. (I wasn't sure if I should admit this but some of my recent posts were actually largely AI written after my input and copying my writing style with some very slight manual edits. If you read my posts, since that one post where I actually said that "this post was AI written" I was waiting for somebody to point it out, as an experiment. Nobody did) Eventually, you could easily "put a veil" around singular people on social media (or really, any platform) and make them believe that they're actually interacting with real people, when actually, they are not. Since humans are tribal creatures it'd be *very* easy to influence what any singular person thinks or feels from that point on, amplifying or muting out issues from their minds at will. I'm pretty sure there's already experiments to that extend running.

Dracrionan rules to use social media will just guarantee that only the big actors (e.g. state, corporation) can do this kind of manipulation. We'll probably need a layer of LLMs eventually to make the internet even useful. I don't know what exactly will happen but I'm also pretty assured that the days of the current internet are counted.

EDIT: To make it a bit more clear - I don't think we're there yet and current LLMs are not sophisticated enough, too easily to trip up and too impractical cost wise. We're probably gonna move post-transformer arch first.
 
Last edited:
EDIT: To make it a bit more clear - I don't think we're there yet and current LLMs are not sophisticated enough, too easily to trip up and too impractical cost wise. We're probably gonna move post-transformer arch first.
I agreed with you up til this point. There are still a couple problems that need to be solved, like long-term coherence over sequences bigger than the token limit for the model, but aside from that I think if they're fine-tuned properly they're just about indistinguishable. It could be that we need to move past transformers still, or it could turn out that something like RAG is a suitable solution. Maybe I'm misunderstanding something in what you mean by "trip up" though.
As for cost I mostly agree, but only when long-term coherence is required. If you're using short inputs, even an absolute ton of them, it's pretty reasonable, especially if it's something you're able to make even a small profit off of.
 
  • Like
Reactions: Vecr and BirdUp
So sorry to necro but I'm here from your comment in the reddit thread and this is fucking incredible. I'm speechless. Do you think it would be harder to train a model on a more insular online community, like here? I'm thinking about forum inside jokes and organic human (as opposed to Redditor) interactions - would those be harder for the language model to pick up on?

I can't stress enough how crazy these numbers are to me. Can you imagine having a system where you can post 3500 comments PER MINUTE to push public sentiment towards whatever your own views are, for only a couple dollars?
Patrick Tomlinson would like a word.
 
So sorry to necro but I'm here from your comment in the reddit thread and this is fucking incredible. I'm speechless. Do you think it would be harder to train a model on a more insular online community, like here? I'm thinking about forum inside jokes and organic human (as opposed to Redditor) interactions - would those be harder for the language model to pick up on?


Patrick Tomlinson would like a word.
I'm glad the thread is alive now lol. Shame it didn't get much attention the first time around. You know I love the dopamine microdose from getting a positive reaction emoji to my posts.
I experimented a bit with communities of varying insularity and these were my observations.

At least with training LoRAs (the most common and cheapest type of fine-tuning) your ability to teach the model wholly new information is limited. They seem capable of picking up on slang terms that the normal version of the model wouldn't be aware of and using them correctly, but they don't seem good at internalizing and reasoning over things like unfamiliar people or locations. For example, if I trained on kingcobrajfs content, it might pick up and correctly use words like "boglim" or "TWU" but I'm not confident they'd be able to, for example, correctly tell you who Scotty Hackman is, despite it being mentioned a number of times. I'm not sure if this is a limitation of training LoRAs, or if it's just something I haven't been able to nail down.

Doing full fine-tunes is an option for open weights models like llama, and this seems to help internalize information better. However, this is more difficult and often more expensive, as fucking it up is easier and redoing it will cost you. It's also a bit harder to deploy at scale.

One of the experiments I did later on, after I got bored of watching the bot shit up a couple especially normie subs, was to see how they'd perform on something where technical knowledge is required. I released them into /r/TableTennis and they did decently. It took about a couple days before they were found out and banned, but I think part of that was probably how frequently they were posting on what's usually a pretty slow sub.

Training GPT specifically on Kiwi Farms wouldn't be an option because OpenAI tries not to let you train on explicit material, and I'd say most of the comments I read on here would probably be a no-go with OpenAI. You can't train on things like "shut up faggot" but strangely, training on anything even mildly inappropriate, like butt jokes on reddit, seems to completely obliterate the safety RLHF that OpenAI did. Here's a fun example from the reddit bot experiment:
1745973847442.webp
I'm not sure if it was trying to make a joke or what, but this got the account banned for threatening violence lol. This comment was probably the most egregious, but it made a lot of off-color remarks, particularly in /r/AskMen. I wouldn't call AskMen an "incel" sub by any stretch of the word, but it seemed to pick up on the fact that there's a common anti-woman sentiment there and then mimicked it.

I wouldn't try doing something like this to KF because I actually respect the people running this site and don't want to run afoul of their rules, but I'm confident that if I did a fine-tune like this on KF it'd mostly work. Most of how we recognize LLMs being used out in the wild is the weird writing quirks that come from chat tuned models, like using em dashes, too many emojis, too strong of a "casual" slant to their writing, etc. If an LLM replied to you and called you a retard for your take on Clint's parenting or something, I don't think you'd think twice. However, I think the niche knowledge required to post in a way where you fit in might be a stumbling block.

To use a more esoteric example, I think an LLM fine tuned on the Sharty would struggle badly. The posts there are wrapped in so many layers of ironic detachment and are so laden with slang that it's hard for most people to decipher. Maybe I'm wrong though, maybe an LLM would fit in simply by virtue of being largely un-understandable.
 
Fascinating, I have to interact with a lot of AI-generated content for work and it's so interesting to hear about the behind the scenes of it. A lot of what you described is in line with some of the trends I've been noticing, like this:
They seem capable of picking up on slang terms that the normal version of the model wouldn't be aware of and using them correctly, but they don't seem good at internalizing and reasoning over things like unfamiliar people or locations. For example, if I trained on kingcobrajfs content, it might pick up and correctly use words like "boglim" or "TWU" but I'm not confident they'd be able to, for example, correctly tell you who Scotty Hackman is, despite it being mentioned a number of times.
This makes so much sense, generated text always seems to "flow" really well but the actual content is super shallow. I could see a model being able to insert a TWU in an appropriate place since that's typically used to punctuate statements in a predictable way, but it wouldn't be able to add anything of substance to the conversation. They tend to make small, weird mistakes too, like an alien skinwalking a human. Kind of like those AI-generated maps of Europe where everything looks perfect except now there's three Serbias.

To use a more esoteric example, I think an LLM fine tuned on the Sharty would struggle badly. The posts there are wrapped in so many layers of ironic detachment and are so laden with slang that it's hard for most people to decipher. Maybe I'm wrong though, maybe an LLM would fit in simply by virtue of being largely un-understandable.
Wait... that could actually be really funny. I wouldn't be able to tell the difference between a Shartybot and a regular poster anyway lol. But yeah irony and complicated references are still way out of reach for LLMs, no wonder this kind of thing seems tailor-made for Reddit.
 
Calling this thread "clairvoyant" would not be enough.
I now have reasonable doubt to think Fatrick is not a real person, and is a fake persona conjured by a LLM.
 
  • Informative
Reactions: Toolbox
Calling this thread "clairvoyant" would not be enough.
I now have reasonable doubt to think Fatrick is not a real person, and is a fake persona conjured by a LLM.
LLMification of a lolcow I'm not so sure about, but lolcowification of an LLM has been done before. If you've got the hardware to run it and want a good laugh, check out this fine tune a friend of mine made. https://huggingface.co/v2ray/TonyGPT-8x7B-QLoRA - It's trained on about 3500 Tony Chase video transcripts and it's almost indistinguishable from the type of schizophrenia he displays
 
Back