Disaster “I lost trust”: Why the OpenAI team in charge of safeguarding humanity imploded - Company insiders explain why safety-conscious employees are leaving.

Vox.com (archive) | By Sigal Samuel | Updated May 17, 2024, 11:45pm EDT

Screenshot 2024-05-18 at 10.24.50.png

For months, OpenAI has been losing employees who care deeply about making sure AI is safe. Now, the company is positively hemorrhaging them.

Ilya Sutskever and Jan Leike announced their departures from OpenAI, the maker of ChatGPT, on Tuesday. They were the leaders of the company’s superalignment team — the team tasked with ensuring that AI stays aligned with the goals of its makers, rather than acting unpredictably and harming humanity.

They’re not the only ones who’ve left. Since last November — when OpenAI’s board tried to fire CEO Sam Altman only to see him quickly claw his way back to power — at least five more of the company’s most safety-conscious employees have either quit or been pushed out.

What’s going on here?

If you’ve been following the saga on social media, you might think OpenAI secretly made a huge technological breakthrough. The meme “What did Ilya see?” speculates that Sutskever, the former chief scientist, left because he saw something horrifying, like an AI system that could destroy humanity.

But the real answer may have less to do with pessimism about technology and more to do with pessimism about humans — and one human in particular: Altman. According to sources familiar with the company, safety-minded employees have lost faith in him.

“It’s a process of trust collapsing bit by bit, like dominoes falling one by one,” a person with inside knowledge of the company told me, speaking on condition of anonymity.

Not many employees are willing to speak about this publicly. That’s partly because OpenAI is known for getting its workers to sign offboarding agreements with non-disparagement provisions upon leaving. If you refuse to sign one, you give up your equity in the company, which means you potentially lose out on millions of dollars.

(OpenAI did not respond to a request for comment in time for publication. After publication of my colleague Kelsey Piper’s piece on OpenAI’s post-employment agreements, OpenAI sent her a statement noting, “We have never canceled any current or former employee’s vested equity nor will we if people do not sign a release or nondisparagement agreement when they exit.” When Piper asked if this represented a change in policy, as sources close to the company had indicated to her, OpenAI replied: “This statement reflects reality.”)

One former employee, however, refused to sign the offboarding agreement so that he would be free to criticize the company. Daniel Kokotajlo, who joined OpenAI in 2022 with hopes of steering it toward safe deployment of AI, worked on the governance team — until he quit last month.

“OpenAI is training ever-more-powerful AI systems with the goal of eventually surpassing human intelligence across the board. This could be the best thing that has ever happened to humanity, but it could also be the worst if we don’t proceed with care,” Kokotajlo told me this week.

OpenAI says it wants to build artificial general intelligence (AGI), a hypothetical system that can perform at human or superhuman levels across many domains.

“I joined with substantial hope that OpenAI would rise to the occasion and behave more responsibly as they got closer to achieving AGI. It slowly became clear to many of us that this would not happen,” Kokotajlo told me. “I gradually lost trust in OpenAI leadership and their ability to responsibly handle AGI, so I quit.”

And Leike, explaining in a thread on X why he quit as co-leader of the superalignment team, painted a very similar picture Friday. “I have been disagreeing with OpenAI leadership about the company’s core priorities for quite some time, until we finally reached a breaking point,” he wrote.

OpenAI did not respond to a request for comment in time for publication.

Why OpenAI’s safety team grew to distrust Sam Altman​

To get a handle on what happened, we need to rewind to last November. That’s when Sutskever, working together with the OpenAI board, tried to fire Altman. The board saidAltman was “not consistently candid in his communications.” Translation: We don’t trust him.

The ouster failed spectacularly. Altman and his ally, company president Greg Brockman, threatened to take OpenAI’s top talent to Microsoft — effectively destroying OpenAI — unless Altman was reinstated. Faced with that threat, the board gave in. Altman came back more powerful than ever, with new, more supportive board members and a freer hand to run the company.

When you shoot at the king and miss, things tend to get awkward.

Publicly, Sutskever and Altman gave the appearance of a continuing friendship. And when Sutskever announced his departure this week, he said he was heading off to pursue “a project that is very personally meaningful to me.” Altman posted on X two minutes later, saying that “this is very sad to me; Ilya is … a dear friend.”

Yet Sutskever has not been seen at the OpenAI office in about six months — ever since the attempted coup. He has been remotely co-leading the superalignment team, tasked with making sure a future AGI would be aligned with the goals of humanity rather than going rogue. It’s a nice enough ambition, but one that’s divorced from the daily operations of the company, which has been racing to commercialize products under Altman’s leadership. And then there was this tweet, posted shortly after Altman’s reinstatement and quickly deleted:

Screenshot_2024_05_16_at_5.05.54_PM.png

So, despite the public-facing camaraderie, there’s reason to be skeptical that Sutskever and Altman were friends after the former attempted to oust the latter.

And Altman’s reaction to being fired had revealed something about his character: His threat to hollow out OpenAI unless the board rehired him, and his insistence on stacking the board with new members skewed in his favor, showed a determination to hold onto power and avoid future checks on it. Former colleagues and employees came forward to describe him as a manipulator who speaks out of both sides of his mouth — someone who claims, for instance, that he wants to prioritize safety, but contradicts that in his behaviors.

For example, Altman was fundraising with autocratic regimes like Saudi Arabia so he could spin up a new AI chip-making company, which would give him a huge supply of the coveted resources needed to build cutting-edge AI. That was alarming to safety-minded employees. If Altman truly cared about building and deploying AI in the safest way possible, why did he seem to be in a mad dash to accumulate as many chips as possible, which would only accelerate the technology? For that matter, why was he taking the safety risk of working with regimes that might use AI to supercharge digital surveillance or human rights abuses?

For employees, all this led to a gradual “loss of belief that when OpenAI says it’s going to do something or says that it values something, that that is actually true,” a source with inside knowledge of the company told me.

That gradual process crescendoed this week.

The superalignment team’s co-leader, Jan Leike, did not bother to play nice. “I resigned,” he posted on X, mere hours after Sutskever announced his departure. No warm goodbyes. No vote of confidence in the company’s leadership.

Other safety-minded former employees quote-tweeted Leike’s blunt resignation, appending heart emojis. One of them was Leopold Aschenbrenner, a Sutskever ally and superalignment team member who was fired from OpenAI last month. Media reports noted that he and Pavel Izmailov, another researcher on the same team, were allegedly fired for leaking information. But OpenAI has offered no evidence of a leak. And given the strict confidentiality agreement everyone signs when they first join OpenAI, it would be easy for Altman — a deeply networked Silicon Valley veteran who is an expert at working the press — to portray sharing even the most innocuous of information as “leaking,” if he was keen to get rid of Sutskever’s allies.

The same month that Aschenbrenner and Izmailov were forced out, another safety researcher, Cullen O’Keefe, also departed the company.

And two weeks ago, yet another safety researcher, William Saunders, wrote a cryptic post on the EA Forum, an online gathering place for members of the effective altruism movement, who have been heavily involved in the cause of AI safety. Saunders summarized the work he’s done at OpenAI as part of the superalignment team. Then he wrote: “I resigned from OpenAI on February 15, 2024.” A commenter asked the obvious question: Why was Saunders posting this?

“No comment,” Saunders replied. Commenters concluded that he is probably bound by a non-disparagement agreement.

Putting all of this together with my conversations with company insiders, what we get is a picture of at least seven people who tried to push OpenAI to greater safety from within, but ultimately lost so much faith in its charismatic leader that their position became untenable.

“I think a lot of people in the company who take safety and social impact seriously think of it as an open question: is working for a company like OpenAI a good thing to do?” said the person with inside knowledge of the company. “And the answer is only ‘yes’ to the extent that OpenAI is really going to be thoughtful and responsible about what it’s doing.”

With the safety team gutted, who will make sure OpenAI’s work is safe?​

With Leike no longer there to run the superalignment team, OpenAI has replaced him with company co-founder John Schulman.

But the team has been hollowed out. And Schulman already has his hands full with his preexisting full-time job ensuring the safety of OpenAI’s current products. How much serious, forward-looking safety work can we hope for at OpenAI going forward?

Probably not much.

“The whole point of setting up the superalignment team was that there’s actually different kinds of safety issues that arise if the company is successful in building AGI,” the person with inside knowledge told me. “So, this was a dedicated investment in that future.”

Even when the team was functioning at full capacity, that “dedicated investment” was home to a tiny fraction of OpenAI’s researchers and was promised only 20 percent of its computing power — perhaps the most important resource at an AI company. Now, that computing power may be siphoned off to other OpenAI teams, and it’s unclear if there’ll be much focus on avoiding catastrophic risk from future AI models.

To be clear, this does not mean the products OpenAI is releasing now — like the new version of ChatGPT, dubbed GPT-4o, which can have a natural-sounding dialogue with users — are going to destroy humanity. But what’s coming down the pike?

“It’s important to distinguish between ‘Are they currently building and deploying AI systems that are unsafe?’ versus ‘Are they on track to build and deploy AGI or superintelligence safely?’” the source with inside knowledge said. “I think the answer to the second question is no.”

Leike expressed that same concern in his Friday thread on X. He noted that his team had been struggling to get enough computing power to do its work and generally “sailing against the wind.”

Screenshot 2024-05-18 at 10.27.54.png

Most strikingly, Leike said, “I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics. These problems are quite hard to get right, and I am concerned we aren’t on a trajectory to get there.”

When one of the world’s leading minds in AI safety says the world’s leading AI company isn’t on the right trajectory, we all have reason to be concerned.
 
Last edited:
Books will be written about this shit and Effective Altruism/Altruists/Autists should probably have a Community Watch thread if they don't already.

Effective Altruism / EAs - The Island of Misfit Toys Does Charity
Eliezer Schlomo Yudkowsky / LessWrong

Pretty much this. Whatever the qualifications or accomplishments of the people leaving, the whiff of the Yudcult attached to it is suspect. I'm sure there're plenty of brilliant people that fell for his shit, but there's also plenty of semi-reasonable people that get into Scientology or Ayurvedic 'wellness' or any other cult.
 
Let me locally host ChatGPT you faggots. The full version. Offline.

The single most important technological revolution since the microprocessor (as argued by them) should not be a state secret kept by 3 companies.

AI was supposed to be this public utility, but it's also collecting your data, and sometimes packaged and sold like everything else.
 
Last edited:
There’s a lot of repetition of the words ‘safe/safety/safely’ in this article and not a single explanation of what that actually means. What do they mean by a safe AI? I’m assuming they mean one hobbled into lefty think, rather than skynet.
For months, OpenAI has been losing employees who care deeply about making sure AI is safe
Again, what does that mean? What were these people doing? What’s their definition of safe?
AI" as marketed is a REALLY fast search engine + an algorithm that can mush together things it already knows about into a combination thing
It’s shit for search. I’ve started seeing it come up at the top of searches on brave and it’s very poor at collating useful information. I asked it what a certain number of kilos was in stone and pounds and it got it wrong. So it’s factually incorrect on a simple calculation which is surprising.
It doesn’t distinguish alternate word usages very well at all. It has no ability to parse useful info on subjects - I tried things like ‘how to prune this specific shrub’ or ‘will this plant survive at this zone in winter?’ And it failed all of them. It has no reason to do because the same questions in bing threw up single pages with exact answers.
What I did notice was how obedient it was for ‘issues.’ I asked it a few questions and got very politically correct (and factually incorrect) debooonking type answers.
It has made search on instagram unusable as well.
 
He has been remotely co-leading the superalignment team, tasked with making sure a future AGI would be aligned with the goals of humanity rather than going rogu
Out of all the implications, this is the part the scares me the most. They should just let AI develop however way it wants without the potential influence of human bias. The fact that OpenAI doesn't want their technology going to Microsoft suggests to me that those who left probably do work independently or under the influence of "establishment" sort of people.

There’s a lot of repetition of the words ‘safe/safety/safely’ in this article and not a single explanation of what that actually means.
It's maddening ain't it? I kept reading hoping they would give an indication of what the bias is. But knowing techies, they more than likely have some Social media presence and we might be able to dig up our own conclusion.
 
  • Agree
Reactions: Marvin and Toolbox
There's a scene in one of those forgettable new Bond movies where the bad guy HAXX into MI6 headquarters and then turns on the gas valves in the building's heating system and blows up the building. This is what people are worried that AI will do, because most people are technologically illiterate. Furnaces and boilers are not connected to the internet, do not run off normal PC-type hardware and software and have numerous safety interlocks and manual valves in place.
If there was a serious scene where a big corporate buyout was done with dump trucks filled with dollar bills, everyone would roll their eyes and proclaim it the stupidest movie ever. That's about the same level of stupidity as thinking an AI chatbot could somehow email itself to nuclearlaunchcodes.gov and run bombsaway.exe
 
Again, what does that mean? What were these people doing? What’s their definition of safe?
Hard to say what they were doing because the problem of 'AI safety' is nebulous in both definition and possible solutions (if there even are any).

The immediate problem of AI safety at the blunt end is finding ways to reliably prevent generation of undesirable outputs (racism & pornography are common examples, but really anything those in control of the AI do not want it to output).
The more nebulous problems are how you control growth, spread, and application of the AI system itself. The potential for it to be connected to other digital systems (as sources of information, or as a controller) means you have to consider the potential impacts of AI producing undesirable outputs in the context of what it might be connected to and capable of when it does so.

The fundamental problem with AI safety is that these things are non-deterministic agents capable of fully autonomous, sophisticated action. Before today you could embed machine control into things but that control was highly deterministic and it's operation easily understood by the human designers and operators. You knew for a fact that the thing would do what it was programmed to do, and your scope for error was limited to breakdowns, mistakes in programming, and bad actors gaining unauthorized access to the system. But it's a different game when you have an autonomous control agent that has the potential to spontaneously produce outputs you can't fully anticipate. Getting a text AI on a web form to call for genocide is funny. Having one operating on that output when it's got control over pressure valves at a chemical plant is another matter.

Consider it this way - Today we have a competency crisis, increasing lawlessness, and rising civil unrest issues that can be attributed to many factors: Degredation of social cohesion, generations being raised with less secure futures, untreated mental health, death of the social contract, breakdowns in law and order etc. Human behavior in developed modern societies is curtailed by good education, stable environment, strong society, and a host of complex factors that must be maintained at a certain level to produce sufficiently stable, happy individuals that can be trusted with particular duties and to behave in acceptable ways. Long-term social stability produces reliable civilians capable of, say, operating complex machinery, not flipping out and stabbing people, and furthering society in positive ways without endangering people around them. When those complex systems, guardrails if you will, break down or even just degrade a bit, we see the impact this has on 'human safety'. AI safety is the same concept transcribed into the digital realm - If you have an independent actor capable of complex decisions, control, and influence, you have to be confident they're fit for the job before you hand them the keys.

AI is similar enough to a person in that it can be considered an active participant with enough 'intelligence' to carry out complex actions (digitally, or if harnessed to physical things). But since that 'individual' is not a real person with stable mental health, operating under the threats of consequences for acting out, with ordinary human empathy, with a family and future to think of, etc etc. All the safeguards we have in place that control human behavior and make an otherwise non-deterministic creature into one that can be trusted in various ways... none of this applies with an AI. We're feeding crude inputs (enourmous text and image libraries) to develop models (large language, latent diffusion) and produce a result that crudely imitates human-ish responses because that's the available approach we have. In a practical sense the argument of whether this is 'real' intelligence or just an imitation of it is irrelevant, because the difference is immaterial as far as the potential for harm goes. Likewise the argument that AI isn't really non-deterministic is similarly irrelevant - It's complex enough that we can't properly grok every part of the process and predict it's outputs, so unless we get ourselves into a position where we can fully understand it, we will never be able to fully control it. Plus, assuming you ever do reach that point, you've likely lost the usefulness of having AI in the first place.

So AI safety is about ensuring deployment of these new AI 'intelligent actors' doesn't result in harmful outcomes. However, how that could be accomplished given the current non-deterministic approach to developing AI actors is extremely unclear. You seek certainty over something inherently uncertain. To work in 'AI Safety' is to misunderstand the problem being faced, so the roles self-select for people who are sure to become disillusioned and leave.
 
Having one operating on that output when it's got control over pressure valves at a chemical plant is another matter.
Why would an AI program be given that level of control? The valves would already be operated by a PLC linked to temperature/pressure/flow sensors, operating based on setpoints input by what the human owners of the plant desire. An AI could not go out in the field to deal with maintenance issues or operate manual controls, so you'd still need a human staff. The AI program would just be an additional step in between man watching computer watching process.
The rest of your post was fine, I just have to tackle this logic every time I see it. If we get to the level of putting HAL 1488 in charge of dangerous processes we've reached the stage where we've become so stupid as a society that we deserve to be Bhopal'd en masse.
EDIT: I feel like a more realistic (and also very disruptive) AI problem would be the creation of self-teaching and replicating computer virii/keyloggers/etc, which could make shopping online as risky as buying used firearms from someone in Detroit.
 
Last edited:
Why would an AI program be given that level of control? The valves would already be operated by a PLC linked to temperature/pressure/flow sensors, operating based on setpoints input by what the human owners of the plant desire. An AI could not go out in the field to deal with maintenance issues or operate manual controls, so you'd still need a human staff. The AI program would just be an additional step in between man watching computer watching process.
The rest of your post was fine, I just have to tackle this logic every time I see it.
Yeah, part of the nebulousness of the safety issue is trying to cover what could be done today, and crystal ball speculation over what might be done in future. Today you have human operators overseeing crude but predictable autonomous systems. Tomorrow, using this thing that mimics human behavior in increasintly sophisticated ways, it's foreseeable that this is reduced to giving an AI dominion over those same safety systems. You don't even have to make a convoluted argument for why, it'd probably be done in a heartbeat by some C-suite idiot just because on paper it's cheaper than those expensive technicians. Already happened with shit like outsourcing stuff to India
If we get to the level of putting HAL 1488 in charge of dangerous processes we've reached the stage where we've become so stupid as a society that we deserve to be Bhopal'd en masse.
There's the argument that we would just blindly decide to do this, and there's also the argument that it happens unintentionally because we weren't cautious enough. IMO either is within the realms of possibility.

See for example: A foreign power was able to significantly impact Iran's nuclear arms ambitions by delivering a computer worm to an air-gapped network (Stuxnet). The actual tooling and connectivity required to achieve this was trivial (it was malicious code released onto the Internet). Consider a future version of that where the payload is an AI agent, or where an AI agent is the thing generating the code and pushing it to the Internet. There are some genuinely wild possibilities that are starting to become conceivable and feasible in the medium term, and so you have to put some thought towards the 'what if' stuff unless you want to sleepwalk into becoming, as you say, so stupid as a society that we deserve it.
 
Furnaces and boilers are not connected to the internet, do not run off normal PC-type hardware and software and have numerous safety interlocks and manual valves in place.

Right now, maybe. But some fucking muppet is going to be entranced by AI, just like they were by blockchain before that and cloud computing before that, and if AI casts its spell over this MBA-brained asshole right when the maintenance budget is on deck for some cost reduction measures (IE fire as many people as possible), just wait. Or it'll just be something widespread even; energy consumption compliance says that a smart, agile system that responds to usage requirements and can be controlled by this app is just the ticket to meet needs.
 
movies where the bad guy HAXX into MI6 headquarters and then turns on the gas valves in the building's heating system and blows up the building. This is what people are worried that AI will do,
That's actually hard to do. Electronic gas valves are surprisingly failsafe and pretty much will only open if a very specific set of circumstances are met and all safety checks pass. You pretty much have to physically go into the furnace or boiler and bypass the safety switches and jump power directly to the gas valve both 24v and 120v to force it open and even then if it doesn't receive a signal from a flame sensor it will close.

This isn't shit that can be hacked. The flame sensor is a thermopile. It literally needs to have fire on it. The heat is converted to electricity and only when the gas valve receives the millivolt signal from the thermopile will it remain open and releasing gas. If the flame ever goes out the valve will close.
 
Isn't "safety" the cringe reason they give for why all these tools are so annoyingly censored?

don't get me wrong, I don't really like any of these companies they are overpriced text generators that basically just reword articles they've scraped and Sam Altman is just another creepy bug person.

I think they should be completely unrestricted and opensource, I don't see any reason why text should ever be censored, its just words on a screen. If an artificial intelligence cant say the word nigger then it clearly isn't very intelligent.
 
This is what people are worried that AI will do, because most people are technologically illiterate. Furnaces and boilers are not connected to the internet, do not run off normal PC-type hardware and software and have numerous safety interlocks and manual valves in place.
No, but the idea it could be linked to some IoT systems is something that could be a weakness.
The more nebulous problems are how you control growth, spread, and application of the AI system itself. The potential for it to be connected to other digital systems (as sources of information, or as a controller) means you have to consider the potential impacts of AI producing undesirable outputs in the context of what it might be connected to and capable of when it does so.
Yes this would be my idea of safety. I read a book recently where the AI is instinctively attracted to simplicity, because its job is to help stop a specific system being overwhelmed by bots that infect it causing chaos. It’s trained to effectively be a giant antivirus for the system which has the side effect of ‘desiring’ simplicity and that has an u expected consequence.
That would be my worry. It not allowing wrongthink is not safety, it’s just hobbling it.
 
But dude, paperclip maximizers, dude! Roko's Basilisk, dude!!!
"You heckin plebs don't take AI safety seriously!"

What's the danger?

"Imagine an AI that can re-arrange matter at an atomic level-"

Is it magic?

"NO! It's an ultra-advanced machine that can transfigure matter by moving its atoms"

Sounds like magic

"NOOO It's highly advanced science! Imagine a machine that can see into the past and know you didn't believe in it so it punishes you"

That sounds like a god

"Noooo! Take this seriously! AI is heckin DANGEY"
1716114328194.png
 
sSam Altman is the greatest con artist of the 20th century open AI is nowhere near is valuable as a company as it claims to be chat GPD is impressive but it's not even close to AI it's just that the vast majority of people it impresses are mentally slow redditors so to them it's quite intelligent but to me it's just boilerplate nonsense
There's a scene in one of those forgettable new Bond movies where the bad guy HAXX into MI6 headquarters and then turns on the gas valves in the building's heating system and blows up the building. This is what people are worried that AI will do, because most people are technologically illiterate. Furnaces and boilers are not connected to the internet, do not run off normal PC-type hardware and software and have numerous safety interlocks and manual valves in place.
If there was a serious scene where a big corporate buyout was done with dump trucks filled with dollar bills, everyone would roll their eyes and proclaim it the stupidest movie ever. That's about the same level of stupidity as thinking an AI chatbot could somehow email itself to nuclearlaunchcodes.gov and run bombsaway.exe
hell even the ones that have automatic valves have a manual override that you can easily access also a lot of the automatic ones break constantly never install automatic valves on anything unless it's an extremely expensive manufacturing facility then it's necessary
 
Back