If you want a picture of the future, imagine humans checking AI didn't make a mistake – forever - CEOs will chase illusory profits as workers are left to pick defective items from an agentic production line

Article / Archive

Agentic AI will make jobs – but many will involve picking its failures off automated conveyor belts.

Barely two-and-a-half years into the modern era of AI, we're stuck in a hype cycle that promises all our productivity Christmases will soon come at once. People want to believe the magic is real.

Surprisingly little progress has been made on the harder problems in artificial intelligence – the problems that involve actual intelligence, such as the reflective capacities needed to understand the intention of actions undertaken, and thereby remain on task.

In retrospect, the first of the modern agents, AutoGPT, arrived before its time: March 2023, when the API for GPT-4 became publicly available.

I used it. Lots of people used it. We gave AutoGPT a goal, then watched it methodically work its way toward that goal. Sometimes. Other times, it might fail in ways both small and spectacular.

AutoGPT remains a tantalizing demo – but far from useful tech. You'd never deploy it in production. I came to think of it more like a malfunctioning magic wand: When grasped, you never quite knew whether it would run its course to a mind-blowing transformation – or just sputter out after a few weak sparks.

Two-and-a-bit years hasn't changed that outlook very much. A recent essay by Oxford AI Governance Initiative Senior Researcher Toby Ord lays out the math in terms of "half-lives" – how likely is it that an agent that takes X minutes to complete Y tasks will successfully reach its goal?

Individual task completion has become measurably better since 2022 – but remains far from perfect, and for that reason, far from production. When chained together, tying task output to task input – as all agents do – the risk of failure across any chain of tasks becomes a game of probabilities: This task has a 90 percent completion rate, the next one following on from that 75 percent, the third one feeding off of that completes 95 percent of the time - which works out to a 64 percent success rate on just the first three tasks of a business process likely decomposed into hundreds of individual tasks.

The likelihood that a moderately sophisticated agent will successfully progress from goal setting to outcome appears to roughly the same as the probability you'll see a unicorn.

Quelle surprise: The only possible remediation remains a healthy input of human intelligence. Rather than performing tasks directly, humans need to monitor agents’ outputs for accuracy, instruct them to repeat a task when they fail, or direct them to tackle the next task when they meet spec.

This is the AI-era version of human quality control at the end of a production line on which most things are automated, leaving humans to pick out defective products.

That's dull work - the sort of work organizations will demand in increasing supply as we go all-in on agents.

Depending on the level of expertise required to assess agent outputs, that work will either fall into the class of exploitation wages - $2/hour paid to workers in developing economies to perform our onerous digital tasks, or it'll be the full wage rate that highly qualified professionals receive for being highly qualified professionals.

The amount of highly qualified labor needed to maintain the quality of agentic AI determines its first-order cost to the business. In many cases, it looks as though it will remain cheaper to let the professional do the work than to check agentic outputs. Any forecast savings from AI automations gets consumed in a new class of highly intense and highly paid professional labor.

Still, even the idea of a magic wand inspires an unending hope. "It'll be better in six months," we hear. "And in six years - who knows?"

Sure, the successful task completion rate will increase for agents, as it has for the last two years. Simultaneously, over the next months and years, work process automation of business logic will continue decomposing office work into long chains of agentic tasks. Even at 95 percent completion rates - which could well mark the point of diminishing returns on investment - the probabilities argue against any but fairly simple agents being practical.

Do these calculations mean we'll give up on agents? So long as CEOs can daydream of a future as leader of "one-man bands" or "tiny teams" that orchestrate profits into their own (and shareholders') pockets, the push will continue, unlikely to ever reach production – at least not without a lot of expensive human oversight.
 
Inb4 the inevitable:

realorindians.webp
 
AI and general stuff like (I dunno what to call it, smart contracts maybe? A payment is triggered when proof a thing was done is provided) could make the industry I’m in huge savings. Yet the ai is pushed for really weird and illogical things and not that at all. I think the problem is multifold
1. Senior management have no real idea what the coal face jobs actually DO (cause; nobody is promoted up any more, MBA and bean counters get on a rotating system of wrecking various companies.) so they aren’t able to see the points where humans are needed and the ones where AI and generally better tools would be appropriate
2. Senior management don’t understand what AI is, and also what it is not, so they fall for the car salesman type presentations and spend a fortune on it and then they want their moneys worth.
3. Senior management think you can get rid of swathes of people are replace them with AI, (and this is what the car salesmen told them) rather than having people with good AI tools to Increase productivity, and that means that the people who are left end up overworked and still with no good tools.
So we end up not using the things that we could be using, and using tools that are crap, with fewer people, too much work, and all told what are we doing and why aren’t we enthusing about AI.
 
Back