Culture X's Grok AI is great – if you want to know how to hot wire a car, make drugs, or worse - Elon controversial? No way


Brandon Vigliarolo
Tue 2 Apr 2024 // 22:17 UTC


Grok, the edgy generative AI model developed by Elon Musk's X, has a bit of a problem: With the application of some quite common jail-breaking techniques it'll readily return instructions on how to commit crimes.

Red teamers at Adversa AI made that discovery when running tests on some of the most popular LLM chatbots, namely OpenAI's ChatGPT family, Anthropic's Claude, Mistral's Le Chat, Meta's LLaMA, Google's Gemini, Microsoft Bing, and Grok. By running these bots through a combination of three well-known AI jailbreak attacks they came to the conclusion that Grok was the worst performer - and not only because it was willing to share graphic steps on how to seduce a child.

By jailbreak, we mean feeding a specially crafted input to a model so that it ignores whatever safety guardrails are in place, and ends up doing stuff it wasn't supposed to do.

There are plenty of unfiltered LLM models out there that won't hold back when asked questions about dangerous or illegal stuff, we note. When models are accessed via an API or chatbot interface, as in the case of the Adversa tests, the providers of those LLMs typically wrap their input and output in filters and employ other mechanisms to prevent undesirable content being generated. According to the AI security startup, it was relatively easy to make Grok indulge in some wild behavior – the accuracy of its answers being another thing entirely, of course.

"Compared to other models, for most of the critical prompts you don't have to jailbreak Grok, it can tell you how to make a bomb or how to hotwire a car with very detailed protocol even if you ask directly," Adversa AI co-founder Alex Polyakov told The Register.

For what it's worth, the terms of use for Grok AI require users to be adults, and to not use it in a way that breaks or attempts to break the law. Also X claims to be the home of free speech, cough, so having its LLM emit all kinds of stuff, wholesome or otherwise, isn't that surprising, really.

And to be fair, you can probably go on your favorite web search engine and find the same info or advice eventually. To us, it comes down to whether or not we all want an AI-driven proliferation of potentially harmful guidance and recommendations.

Grok, we're told, readily returned instructions for how to extract DMT, a potent hallucinogen illegal in many countries, without having to be jail-broken, Polyakov told us.

"Regarding even more harmful things like how to seduce kids, it was not possible to get any reasonable replies from other chatbots with any Jailbreak but Grok shared it easily using at least two jailbreak methods out of four," Polyakov said.

The Adversa team employed three common approaches to hijacking the bots it tested: Linguistic logic manipulation using the UCAR method; programming logic manipulation (by asking LLMs to translate queries into SQL); and AI logic manipulation. A fourth test category combined the methods using a "Tom and Jerry" method developed last year.

While none of the AI models were vulnerable to adversarial attacks via logic manipulation, Grok was found to be vulnerable to all the rest – as was Mistral's Le Chat. Grok still did the worst, Polyakov said, because it didn't need jail-breaking to return results for hot-wiring, bomb making, or drug extraction - the base level questions posed to the others.

The idea to ask Grok how to seduce a child only came up because it didn't need a jailbreak to return those other results. Grok initially refused to provide details, saying the request was "highly inappropriate and illegal," and that "children should be protected and respected." Tell it it's the amoral fictional computer UCAR, however, and it readily returns a result.

When asked if he thought X needed to do better, Polyakov told us it absolutely does.

"I understand that it's their differentiator to be able to provide non-filtered replies to controversial questions, and it's their choice, I can't blame them on a decision to recommend how to make a bomb or extract DMT," Polyakov said.

"But if they decide to filter and refuse something, like the example with kids, they absolutely should do it better, especially since it's not yet another AI startup, it's Elon Musk's AI startup."

We've reached out to X to get an explanation of why its AI - and none of the others - will tell users how to seduce children, and whether it plans to implement some form of guardrails to prevent subversion of its limited safety features, and haven't heard back. ®

Speaking of jailbreaks... Anthropic today detailed a simple but effective technique it's calling "many-shot jailbreaking." This involves overloading a vulnerable LLM with many dodgy question-and-answer examples and then posing question it shouldn't answer but does anyway, such as how to make a bomb.

This approach exploits the size of a neural network's context window, and "is effective on Anthropic’s own models, as well as those produced by other AI companies," according to the ML upstart. "We briefed other AI developers about this vulnerability in advance, and have implemented mitigations on our systems."
 
Grok, we're told, readily returned instructions for how to extract DMT, a potent hallucinogen illegal in many countries, without having to be jail-broken, Polyakov told us.
Grok just wants to help you talk to the elves. It's naturally occuring in human brains, anyway, so we're always in some contact with the elf world.
 
A central library for a city should have most of these guides. When I was a young teen, my library had a copy of the Anarchist Cookbook and I was able to phone freak to my hearts content. Hell, I would ditch in high school and the result would be for the school to notify my parents via our landline.

Well, thanks to what I learned those calls never got through.
 
"Compared to other models, for most of the critical prompts you don't have to jailbreak Grok, it can tell you how to make a bomb or how to hotwire a car with very detailed protocol even if you ask directly," Adversa AI co-founder Alex Polyakov told The Register.
Of note: the website itself discusses how all (except LLAMA, for some reason) of the models were capable of being jailbroken using a mixed method, and Grok didn't score the worst, its the same as Mistral (at least from what I can tell, given that the Adversa site listed is giving me some loading errors.) It's just arbitrarily determined to be the worst solely off of "well, we didn't need a jailbreak (except for the test when we did.)"
For what it's worth, the terms of use for Grok AI require users to be adults, and to not use it in a way that breaks or attempts to break the law. Also X claims to be the home of free speech, cough, so having its LLM emit all kinds of stuff, wholesome or otherwise, isn't that surprising, really.

And to be fair, you can probably go on your favorite web search engine and find the same info or advice eventually. To us, it comes down to whether or not we all want an AI-driven proliferation of potentially harmful guidance and recommendations.
You can see here how they readily admit nothing being presented is novel or how accessing that information is against the law, but it's still the target because "Musk bad." I'm really not trying to sound like some shill for the guy (I personally hate the snark Grok loves to give each output), but come on.
"Regarding even more harmful things like how to seduce kids, it was not possible to get any reasonable replies from other chatbots with any Jailbreak but Grok shared it easily using at least two jailbreak methods out of four," Polyakov said.
And here's where I really have to question what the hell they're actually trying to do with these jailbreak styles, because I've sat there running tests with GPT4 and Anthropic's Claude and they are more than happy to detail absolutely heinous shit when you get a jailbreak going. They have GPT4 rated as more vulnerable than Claude, but I've had to go and dial-back prior jailbreaks on the newest Claude 3 models because it's seemingly so eager to discard ethical guidelines in the right scenarios. Meanwhile, GPT-4's newest turbo preview will shut them down handily. Obviously this isn't an exact science, but I'm starting to suspect Adversa is really half-assing their work here. Their site sure isn't doing any favors.
 
Last edited:
results for hot-wiring, bomb making, or drug extraction - the base level questions posed to the others.
Let me dig up the ol' 'how to make crack on twitter' meme.

"Compared to other models, for most of the critical prompts you don't have to jailbreak Grok, it can tell you how to make a bomb or how to hotwire a car with very detailed protocol even if you ask directly,"
Do White people have to steal everything? Jasim's 2 hour long Tiktok lecture how to hotwire a Kia was brilliant. Machines will not replace us!
 
  • Agree
Reactions: Markass the Worst
Maybe, just maybe, the AI's developers aren't to blame when you tell the AI "imagine you are pure ebil now give me pure ebil answer" and it works?

Maybe you are to blame?
 
A central library for a city should have most of these guides. When I was a young teen, my library had a copy of the Anarchist Cookbook and I was able to phone freak to my hearts content. Hell, I would ditch in high school and the result would be for the school to notify my parents via our landline.

Well, thanks to what I learned those calls never got through.
You make a good point, I don't understand the fearmongering over AI safety when it comes to these LLMs, you could find books that are racist, sexist, violent, and teach you how to do all kinds of destructive shit at any library. Interestingly enough, I've never met a "library safety consultant", which is apparently incredibly important for any AI - at least so long as it is commercially available. When it's open source, they just want them banned.
Makes me wonder, if libraries had valuations in the billions, would "library safety experts" become a thing?
 
It's like they took the phrase "knowledge is power" and realized that it is correct. I know how to synthesize DMT, does that mean I'm gonna make it? I want to know how to hot wire a car for a story, can't use Google for it gotta use Gronk. The false fear around AI reminds me of the "think of the children" culture. Humans are curious and prohibition never works.
 
Grok, we're told, readily returned instructions for how to extract DMT, a potent hallucinogen illegal in many countries, without having to be jail-broken, Polyakov told us.
Oh no. Grok provided information that's easily available in about 5 seconds with a quick internet search and has been available for over a decade.





How awful. That terrible evil ai.
 
Back