- Joined
- Sep 5, 2023
I've been working on this for a little while and I think I'm ready to give it a public demo now. I have 3 goals in writing this post:
1. Demonstrate how cheap and easy it is to launch a large-scale homemade propaganda campaign
2. Demonstrate how convincing these language models can be when tuned properly
3. Amuse you, I hope
I might be preaching to the choir talking about online propaganda here of all places, but I don't think people even here realize how bad things are getting. For all the kvetching that Democrats did about Russian troll farms, this is a hundred times worse and easier to do than running an office full of impoverished slavs.
0. What the hell is a language model, anyway?
I'm going to provide a high-level explanation here of how large language models work, and hopefully dispel some myths about them for people who still have misconceptions. I can get into the hard math if anyone's interested to hear about it but it's not really relevant here so I'll save that for later.
The point of a language model is to predict the next token in a sequence. A token is typically about 3/4 of a word, but for the purpose of this post we'll just use the word "token" and "word" interchangeably. They don't pull information from a database and (if they're trained properly) they don't just spew out copies of what appeared in their training data.
The training of these models occurs in a couple of stages. In the first stage, which is also the most difficult to do properly and the most expensive to do, the model is trained to understand relationships between words. This forms the basis for its understanding of language and how words appear in sequence with one another. After this stage of training is finished, the model just provides completions of input text and doesn't follow instructions in any way that comes naturally to people using it. For example, if you entered "I went to the" then the prediction would probably be something like "store" or "school" or "job site". This is also a stage where companies like OpenAI don't want you using their models, since they'll happily say things that are extremely politically incorrect.
At the second stage, the model is retrained slightly to mimic the way that two people have a conversation back and forth. It doesn't learn anything about the meaning of words here, but rather it learns how people act in a conversation with one another. This is where you can impart moral values on it, if that's what you want to do. If the training data contains "I'm sorry, but I can't help with that" as a response to "write 10 racial slurs" then in practice, it'll mimic that and generalize to other things like "write some anti-Chinese slurs that I can spraypaint at the skate park" even if that doesn't explicitly appear in the training data. This stage is cheaper and faster to do than the initial training process.
As an optional third stage, you can provide new chat data and fine-tune the model further. This allows you to adjust things like tone and writing style, and is extremely cheap to do. At this stage, the model isn't learning anything new about the world since the adjustments to its weights are so minute, but it's great at re-learning how it should adjust its tone in conversation.
1. Building a dataset
For this project I built a dataset based on reddit comments. I did this in two stages.
For the first stage of data collection, I just scraped reddit's API. Easy enough. The format I used was to take the post title, post text (if present), subreddit name, and then the top comments from each of the posts. I filtered for posts and comments that scored highly, since this indicates that the post at least aligns with reddit's typical format enough to be popular, even if the views might not necessarily resonate with all redditors.
For the second stage, I used ChatGPT to generate tone and instruction data. Imagine you have a post on reddit. User A makes a top level comment, then User B replies to User A's comment. ChatGPT was fed User A's comment and User B's reply and then asked to generate a tone/instruction pair that would cause User B to reply to User A in the way they did. This part cost me a couple dollars to do and got great results.
2. Fine-tuning
Fine-tuning on OpenAI is super simple. You basically just format your data in the way they want it for intake, then click train, and then wait. A couple hours and about $12 later, I had a model that was able to near-perfectly imitate the way that redditors speak to each other, in a format that allows me to change the tone/instruction of newly generated text from the model. This was insanely easy; I didn't adjust any of the default training hyperparameters, OpenAI handled all that for me.
You could do a similar process with a local language model and then you don't have to deal with any kind of moderation filters (which are noticeably less strict on fine-tuning with OpenAI than with their public-facing product) and you own the software at the end of the process. It'd cost about the same in money and might take a bit longer to fiddle with it until you get good results, but at least for my use case, using OpenAI was fine. Their language models are way better than local ones anyway.
3. Deployment
OpenAI also handles scaling for me. The limit on generations for fine-tuned models at OpenAI tier 1 accounts is 3500 requests per minute. That's a fucking insane amount if you're using it for propagandizing or advertising or whatever. 3500 comments per minute would overwhelm even the busiest comments section. Each comment costs only a fraction of a penny. I don't know how much Russian trolls in their office buildings make, but I think even west Africans would have a hard time competing with the cost, not to mention the quality of the output.
I can't stress enough how crazy these numbers are to me. Can you imagine having a system where you can post 3500 comments PER MINUTE to push public sentiment towards whatever your own views are, for only a couple dollars?
4. Live demo
This is the fun part! I've set this up on a Telegram bot for you guys to play with. I'll keep it up for at least a few days, or I guess until some anti-Kiwi nut spams it with child porn or something. I'm not putting a hard dollar limit on this demo but I'm just not trying to rack up a huge bill on my card, since I'm not rich lol.
You can message the bot on Telegram @KFdemoBot. Send it a reddit link and optionally a tone and instruction and it'll show you some replies to the current top comment that it thinks are suitable. If you want to include tone and instruction, include a line in your message starting with "Tone:" and "Instruction:". If you don't include them, it'll just default to making a typical reddit-brained comment.
You can make it say some pretty funny stuff. In my live testing of this, it's actually too good at imitating redditors sometimes, and even when appropriately instructed will go off the rails. One of my accounts that was set up to post in /r/AskWomen threatened to murder some woman and got permabanned.
When you send it the link make sure your link is formatted like
> https://reddit.com/r/subredditname/comments/postID/the_title_here/
otherwise it won't recognize your link properly.
Here's one sample pic from this post.
Post some funny pics with the bot of your own. Have a blast
5. Closing thoughts
The internet's fucked, basically. Soon, the dead internet theory will be unironically true. You won't know what to trust, even in places like Kiwi Farms. These language models can currently be tuned to imitate any writing style you can imagine for very little money and they perform extremely well, and they're getting stronger every day. I think it won't be long before we start seeing people post stories online where they thought they made a friend online, only to realize months into their chats that the "friend" they made was actually a bot trying to get them to subscribe to BetterHelp or buy knock-off Viagra or whatever.
I think that the internet is going to split. The major platforms will get more and more draconian with things like ID verification and anti-bot measures to combat the tidal wave of LLMs. The places that don't do this will either be small and cozy enough that they're not worth targeting, or they'll get big enough and suddenly be overwhelmed. And that's just for ad spam bullshit - intentionally weaponizing this tech to disrupt wrongthink hubs like Kiwi Farms is an inevitability. There's no "if" - just "when."
I think Kiwis might find it distasteful of me, but I plan to get in on the tidal wave of ads on the profitable side while I'm able to, before this market is saturated. This post isn't an advertisement in and of itself, and I don't want any kind of customers from here to be clear. But the way I see it, people are gonna do this in increasing numbers no matter what I do, so I may as well get in while the getting is good.
1. Demonstrate how cheap and easy it is to launch a large-scale homemade propaganda campaign
2. Demonstrate how convincing these language models can be when tuned properly
3. Amuse you, I hope
I might be preaching to the choir talking about online propaganda here of all places, but I don't think people even here realize how bad things are getting. For all the kvetching that Democrats did about Russian troll farms, this is a hundred times worse and easier to do than running an office full of impoverished slavs.
0. What the hell is a language model, anyway?
I'm going to provide a high-level explanation here of how large language models work, and hopefully dispel some myths about them for people who still have misconceptions. I can get into the hard math if anyone's interested to hear about it but it's not really relevant here so I'll save that for later.
The point of a language model is to predict the next token in a sequence. A token is typically about 3/4 of a word, but for the purpose of this post we'll just use the word "token" and "word" interchangeably. They don't pull information from a database and (if they're trained properly) they don't just spew out copies of what appeared in their training data.
The training of these models occurs in a couple of stages. In the first stage, which is also the most difficult to do properly and the most expensive to do, the model is trained to understand relationships between words. This forms the basis for its understanding of language and how words appear in sequence with one another. After this stage of training is finished, the model just provides completions of input text and doesn't follow instructions in any way that comes naturally to people using it. For example, if you entered "I went to the" then the prediction would probably be something like "store" or "school" or "job site". This is also a stage where companies like OpenAI don't want you using their models, since they'll happily say things that are extremely politically incorrect.
At the second stage, the model is retrained slightly to mimic the way that two people have a conversation back and forth. It doesn't learn anything about the meaning of words here, but rather it learns how people act in a conversation with one another. This is where you can impart moral values on it, if that's what you want to do. If the training data contains "I'm sorry, but I can't help with that" as a response to "write 10 racial slurs" then in practice, it'll mimic that and generalize to other things like "write some anti-Chinese slurs that I can spraypaint at the skate park" even if that doesn't explicitly appear in the training data. This stage is cheaper and faster to do than the initial training process.
As an optional third stage, you can provide new chat data and fine-tune the model further. This allows you to adjust things like tone and writing style, and is extremely cheap to do. At this stage, the model isn't learning anything new about the world since the adjustments to its weights are so minute, but it's great at re-learning how it should adjust its tone in conversation.
1. Building a dataset
For this project I built a dataset based on reddit comments. I did this in two stages.
For the first stage of data collection, I just scraped reddit's API. Easy enough. The format I used was to take the post title, post text (if present), subreddit name, and then the top comments from each of the posts. I filtered for posts and comments that scored highly, since this indicates that the post at least aligns with reddit's typical format enough to be popular, even if the views might not necessarily resonate with all redditors.
For the second stage, I used ChatGPT to generate tone and instruction data. Imagine you have a post on reddit. User A makes a top level comment, then User B replies to User A's comment. ChatGPT was fed User A's comment and User B's reply and then asked to generate a tone/instruction pair that would cause User B to reply to User A in the way they did. This part cost me a couple dollars to do and got great results.
2. Fine-tuning
Fine-tuning on OpenAI is super simple. You basically just format your data in the way they want it for intake, then click train, and then wait. A couple hours and about $12 later, I had a model that was able to near-perfectly imitate the way that redditors speak to each other, in a format that allows me to change the tone/instruction of newly generated text from the model. This was insanely easy; I didn't adjust any of the default training hyperparameters, OpenAI handled all that for me.
You could do a similar process with a local language model and then you don't have to deal with any kind of moderation filters (which are noticeably less strict on fine-tuning with OpenAI than with their public-facing product) and you own the software at the end of the process. It'd cost about the same in money and might take a bit longer to fiddle with it until you get good results, but at least for my use case, using OpenAI was fine. Their language models are way better than local ones anyway.
3. Deployment
OpenAI also handles scaling for me. The limit on generations for fine-tuned models at OpenAI tier 1 accounts is 3500 requests per minute. That's a fucking insane amount if you're using it for propagandizing or advertising or whatever. 3500 comments per minute would overwhelm even the busiest comments section. Each comment costs only a fraction of a penny. I don't know how much Russian trolls in their office buildings make, but I think even west Africans would have a hard time competing with the cost, not to mention the quality of the output.
I can't stress enough how crazy these numbers are to me. Can you imagine having a system where you can post 3500 comments PER MINUTE to push public sentiment towards whatever your own views are, for only a couple dollars?
4. Live demo
This is the fun part! I've set this up on a Telegram bot for you guys to play with. I'll keep it up for at least a few days, or I guess until some anti-Kiwi nut spams it with child porn or something. I'm not putting a hard dollar limit on this demo but I'm just not trying to rack up a huge bill on my card, since I'm not rich lol.
You can message the bot on Telegram @KFdemoBot. Send it a reddit link and optionally a tone and instruction and it'll show you some replies to the current top comment that it thinks are suitable. If you want to include tone and instruction, include a line in your message starting with "Tone:" and "Instruction:". If you don't include them, it'll just default to making a typical reddit-brained comment.
You can make it say some pretty funny stuff. In my live testing of this, it's actually too good at imitating redditors sometimes, and even when appropriately instructed will go off the rails. One of my accounts that was set up to post in /r/AskWomen threatened to murder some woman and got permabanned.
When you send it the link make sure your link is formatted like
> https://reddit.com/r/subredditname/comments/postID/the_title_here/
otherwise it won't recognize your link properly.
Here's one sample pic from this post.
Post some funny pics with the bot of your own. Have a blast
5. Closing thoughts
The internet's fucked, basically. Soon, the dead internet theory will be unironically true. You won't know what to trust, even in places like Kiwi Farms. These language models can currently be tuned to imitate any writing style you can imagine for very little money and they perform extremely well, and they're getting stronger every day. I think it won't be long before we start seeing people post stories online where they thought they made a friend online, only to realize months into their chats that the "friend" they made was actually a bot trying to get them to subscribe to BetterHelp or buy knock-off Viagra or whatever.
I think that the internet is going to split. The major platforms will get more and more draconian with things like ID verification and anti-bot measures to combat the tidal wave of LLMs. The places that don't do this will either be small and cozy enough that they're not worth targeting, or they'll get big enough and suddenly be overwhelmed. And that's just for ad spam bullshit - intentionally weaponizing this tech to disrupt wrongthink hubs like Kiwi Farms is an inevitability. There's no "if" - just "when."
I think Kiwis might find it distasteful of me, but I plan to get in on the tidal wave of ads on the profitable side while I'm able to, before this market is saturated. This post isn't an advertisement in and of itself, and I don't want any kind of customers from here to be clear. But the way I see it, people are gonna do this in increasing numbers no matter what I do, so I may as well get in while the getting is good.