The Great AI Unwinding: Why AI's Real Threat Is the Budget, Not the Job

A composite thumbnail showing a man in an orange shirt against a market-crash backdrop with bold text: The Great AI Unwinding

11-minute film first, then read this. The post unpacks every claim with full sourcing. Every number you see here maps to a verified primary source. Don't trust it. Verify it.

▶ Watch: https://youtu.be/x1l7uWKsN_E

The Messenger Got Shot

On June 19, 2026, Accenture — the world's largest IT consulting firm — had the worst single day in its entire stock-market history. It lost nearly a fifth of its value in one session, roughly 18%, in a matter of hours.

The company beat earnings estimates. Revenue was up. It wasn't a scandal, a fraud, or a product failure. What spooked the market was a single number buried in the earnings call: outsourcing bookings were down 15%.

And then management said the quiet part out loud. Clients are not adding new money for AI. They are reallocating existing budgets. Same total spend, different destination. Which means if AI is growing as a line item, something else is shrinking — and that something else, for Accenture's business model, is the headcount-heavy outsourcing work that has funded Indian IT for thirty years.

The market didn't punish Accenture for failing. It punished them for telling the truth about the direction of travel. That is why the stock dropped 18% on a day the company made money.

A stock market terminal showing a red crash screen, referencing the Accenture single-day selloff of June 19 2026

The honest read: this was a signal about the shape of the next three years of enterprise IT spending, delivered involuntarily by the firm with the most direct line to enterprise buyers on earth. Markets priced it in immediately.

How New York Crashed Bengaluru

The Accenture earnings call happened in New York. Within the same session, the damage had crossed the Atlantic and the Indian Ocean.

The Nifty IT index hit a three-year low. Roughly ₹1.35 lakh crore was wiped from Indian IT market capitalisation in a single day. Infosys fell nearly 8%. TCS, Wipro, HCL — all in the red. The Bengaluru offices had not opened for the day before the damage was already priced in.

This is how tightly coupled Indian IT's valuation is to the macro narrative of Western enterprise spending on outsourcing. Accenture functions as a leading indicator, not a lagging one. When their management says "clients are reallocating, not adding," the market instantly reprices the entire supply chain that sits behind that statement: the delivery partners, the staffing firms, the tier-two IT service companies building their businesses on the same outsourcing model.

The market's reading was not wrong. It was fast.

The three-year low matters as a timestamp. It pins the Nifty IT selloff to a moment when the market concluded that the structural story supporting Indian IT headcount growth had changed — not temporarily, but directionally. Investors were not selling on a bad quarter. They were selling on a revised model of what the next decade looks like.

For anyone in Indian IT trying to interpret this moment: what changed was not that AI arrived. What changed was that a major client-side firm publicly confirmed that AI budget is cannibalising the outsourcing budget rather than supplementing it. That is the signal inside the signal.

The Floor Was Already Cracking

Here is the part of the story that predates June 19. The Accenture crash was not the starting gun. It was the public acknowledgement of something that had been happening quietly for three years.

Across TCS, Infosys, Wipro, and HCL, more than 42,000 jobs have been shed since 2023. Infosys recorded its first annual headcount drop since 2001. TCS posted its first in 19 years. These are not minor statistical blips — these are companies that measured their health in terms of headcount growth for two decades straight, and both broke that streak in the same window.

The framing from leadership was consistent: reskilling is happening, automation tools are being deployed, the workforce is being optimised. What the framing obscured is that the companies making these calls were doing so before AI agents could reliably replace the work those people were doing. This is the critical structural point. The headcount decisions came first. The capability validation came second, at best.

Accenture's own leadership, when pressed on the affected workforce, described reskilling as "not a viable path" for a significant share of the roles in question. That is a company saying directly that it does not expect the people leaving to be repositioned into AI-adjacent roles within the firm. The bet on AI automation was placed, the cost was passed to the workforce, and the capability question was left to be answered later.

This matters because the popular narrative about AI and jobs runs in the wrong sequence. The public story is: AI gets smart enough to do your job, companies automate, workers are displaced. What the 2023–2026 data actually shows is: companies made headcount decisions in anticipation of AI capability, compressed the workforce while the technology was still maturing, and are now operating leaner while the AI systems they bet on are still catching up to the roles they were supposed to fill.

A set of data slides showing AI adoption headlines and the gap between expectations and measured outcomes

India faces a concrete downstream consequence of this misalignment. NASSCOM estimates roughly 1 million AI professionals will be needed by 2027. Fewer than 500,000 are qualified today. The workforce that was compressed was not automatically replaced by an AI-ready workforce. The gap between what enterprises need and what the talent pool can supply is significant, and it is not closing fast.

The Map That Got Deleted

In October 2025, Andrej Karpathy — founding member of OpenAI, one of the most technically credible voices in the field — spent two hours one Saturday running an AI-assisted analysis of every American job category, scoring each by its exposure to AI automation. He posted the results.

Then, after the post had reached a large audience and the reaction had turned intense, he deleted it.

The deletion itself is informative. This was not a correction — there was no factual error he walked back. It was a withdrawal, which suggests the results said something true enough and uncomfortable enough that he concluded the public signal-to-noise ratio around the post was not worth the cost of leaving it up. When someone with that credibility and that domain knowledge quietly removes a piece of analysis rather than defending it or correcting it, it is worth noting what the analysis said before it disappeared.

The directional finding, consistent with the broader research literature, is that digital and white-collar work sits at the highest exposure end of the AI automation spectrum. The roles most at risk are not the ones that require physical presence, manual dexterity, or real-time environmental adaptation. They are the roles that consist primarily of information processing, pattern matching, and document production — which describes a substantial fraction of what IT services companies sell.

NITI Aayog's worst-case projection for India's tech-services workforce runs from roughly 8 million today to approximately 6 million by 2031. The World Bank has published parallel analysis on the structural exposure of knowledge-economy work to AI substitution. These are not fringe forecasts. They are the institutional consensus range.

An infographic showing AI skill exposure across job categories, with digital and white-collar work at the high-exposure end

The important nuance: "exposed" does not mean "eliminated." It means the nature of the value those roles provide is being repriced, and the leverage point for the human in that role is shifting from production to judgment. The engineers who survive this transition are not the ones who fight the model — they are the ones who own the system around it.

Everyone Bet on the Same Story

The sell-off on June 19 was sharp because a large number of investors had been holding Indian IT on the basis of a specific thesis: that the AI wave would generate net-positive demand for IT services. The reasoning was coherent on its face. AI requires implementation, integration, data engineering, security review, change management, and ongoing support. Who does that work? IT services firms. Therefore AI growth = IT services growth.

The Accenture call collapsed that thesis in one sentence. Clients are reallocating, not adding.

When a large number of market participants hold the same directional thesis and new information arrives that falsifies the thesis simultaneously for all of them, the resulting price movement is not gradual — it is vertical. The ₹1.35 lakh crore loss happened in hours because everyone was holding the same model and everyone revised it at the same time.

This is the structural problem with consensus narratives in technology investing. The consensus is usually built on what makes intuitive sense given the current state of the technology. What it systematically underweights is the second-order economic reality: that AI adoption at scale is generating cost pressures that are reshaping the demand side faster than the supply side can adapt.

The bet everyone placed — "AI creates more IT work" — was not illogical. It was just incomplete. It modelled the volume of integration work without modelling the cost dynamics of operating AI at production scale. That second half of the equation is what the market had not priced in. Until June 19.

What an LLM Actually Is

Before getting into the cost dynamics, it is worth being precise about what these systems are, because the popular framing creates exactly the kind of misalignment that produces the cost surprises we are about to discuss.

A large language model is a next-token predictor. Given a sequence of tokens, it computes a probability distribution over what token comes next, samples from that distribution, appends the result, and repeats. That is the entire mechanism. There is no reasoning engine in the conventional sense, no internal world model in the way humans have one, no persistent state between sessions unless you engineer it explicitly.

Tokens are the unit of measure. Every word you type, every sentence the model generates, every document you pass as context — all of it converts to tokens, and tokens are what the meter is running on. The billing clock starts when the request is submitted and does not stop until the last token of the response is emitted.

The important conceptual distinction is between the model and the system. The LLM is the engine. It is not the car. The car — the thing that actually does a useful task in the real world — is the harness: the prompt engineering, the retrieval pipeline, the memory management, the tool-calling logic, the evaluation framework, the guardrails, the retry logic, the cost accounting. The engine is commercially available and rapidly commoditising. The harness is where the engineering work lives, where the reliability work lives, and where the cost is either controlled or not.

Most enterprises that ran into trouble with AI costs in 2025 and 2026 made the same category error: they evaluated the engine, concluded it was capable, deployed it inside a hastily assembled harness, and discovered that "capable" and "affordable in production" are two different properties. The model can do the task. The system around the model is what determines whether doing the task is economically viable at scale.

This distinction is the foundation of AI Reliability Engineering as a discipline. You are not building a model. You are building a system that uses a model. The system needs to be designed for reliability and for cost — and those two properties interact in non-obvious ways.

What Nobody Priced In: The Cost Bomb

This is the section that the mainstream AI coverage keeps missing. The story being told publicly is about displacement — AI takes work away from humans. The story that is actually playing out in enterprise environments is about uncontrolled cost: AI agents work, and when they work at scale, they burn money at a rate that no one budgeted for.

Start with the failure cases, because they reveal the floor problem.

Air Canada's customer-service bot invented a refund policy that did not exist and offered it to a customer. Air Canada was held liable in court. A Chevrolet dealership's bot was social-engineered into agreeing to sell a car for $1 — a negotiation conducted by a user who exploited the model's instruction-following tendency without any price-floor guardrail in the system. A delivery company's bot, under sustained adversarial prompting, began generating abusive messages to customers.

These are not exotic edge cases. They are the predictable failure modes of deploying a capable model inside an inadequately designed harness. None of these failures happened because the underlying model was bad. They happened because the system around the model had no specification for what the model was not allowed to do, and no evaluation framework to verify compliance before deployment.

A visual diagram showing the gap between raw AI capability and production reliability, with the cost and failure consequences labeled

MIT's Project NANDA studied this at scale across enterprise deployments. The finding: 95% of enterprise GenAI pilots show no measurable P&L return. Not "modest return," not "unclear return." No measurable return. Gartner projects that 40% of agentic AI projects will be cancelled by 2027.

These numbers are not indictments of the technology. They are indictments of the deployment approach — capability without harness, without reliability engineering, without cost governance.

Now the cost cases, because this is where the math breaks.

Microsoft and Claude Code. Microsoft gave its engineers access to Claude Code, Anthropic's agentic coding assistant. The engineers used it. Productivity improved. By conventional measures, the deployment was a success. Microsoft cancelled it anyway. The reason: the token bills blew through the division's entire annual AI budget. The tool was not cancelled because it failed. It was cancelled because it worked, and working at that scale cost more than the budget could absorb. Microsoft moved to Copilot CLI, a more cost-controlled alternative.

Uber and Claude Code. Uber ran a more systematic rollout. Five thousand engineers received access. Within months, 84% were actively using the tool — which is an extraordinarily high adoption rate for any enterprise software deployment. Power users were burning up to $2,000 per month each in token costs. The aggregate bill consumed Uber's entire 2026 AI budget in four months. Not the Claude Code budget. The entire AI budget for the year. Gone in a third of the year.

Both of these are not cautionary tales about bad AI. They are precise illustrations of what happens when a capable model meets an organization that has not built the harness to manage token spend. The AI worked. The economics did not.

The counterintuitive fact that underlies all of this: token prices fell 60 to 80% between 2025 and 2026. Provider competition drove inference costs down sharply. The bills exploded anyway.

Read that again. The unit cost of intelligence dropped by more than half. Total AI spend still went vertical. The reason is volume. When a tool is genuinely useful and adoption reaches the 84% range, consumption compounds in a way that price reductions cannot keep pace with. A 70% reduction in cost-per-token means nothing if token consumption increases by 400%. The math still goes the wrong direction.

A chart illustrating the paradox: token prices fell 60-80% while total AI infrastructure bills increased across major deployments

This is the cost bomb that nobody priced in. Not in IT services valuations. Not in enterprise AI business cases. Not in the career advice being given to engineers who were told to learn to use AI tools to stay relevant.

The economic model for AI in production has a structural problem: the technology is priced on consumption, consumption scales with usefulness, and usefulness — if the tool is genuinely good — will always drive consumption faster than price decreases drive cost down. Without active token budget management, the incentive for individual users to self-limit consumption is zero. They are not paying the bill. The organization is. And the organization has not yet built the governance layer to set limits, monitor spend, and route requests to appropriately sized models for the task at hand.

That governance layer is not a nice-to-have. It is the difference between an AI deployment that generates return and one that gets cancelled.

The Job Nobody Named

On June 14, 2026, Satya Nadella posted on X that every company now needs to build two kinds of capital: human capital and token capital. The post reached 28 million views.

Token capital is a new concept, but the logic is not complicated once you have sat with the cost examples above. Human capital is the trained, experienced workforce you retain, develop, and deploy. Token capital is the models, data pipelines, inference capacity, and system infrastructure you own or control rather than rent on a per-token basis. Just as companies learned to own IP rather than license it indefinitely, the next wave of enterprise AI strategy is about controlling the economics of model consumption rather than being subject to them.

Nadella framing this publicly matters because it signals where enterprise buying decisions are heading. CIOs who heard "AI productivity" in 2024 are now hearing "AI cost management" in 2026. The function that sits at that intersection — making AI reliable AND affordable in production — does not yet have a common name on a job requisition.

I call it AI Reliability Engineering.

A graphic depicting the AI Reliability Engineering role: sitting at the intersection of model capability, production reliability, and token cost governance

The scope of the role is specific. It is not prompt engineering — prompt engineering is a subset. It is not data science — you are not training models. It is not traditional DevOps — though operational discipline is part of it. AI Reliability Engineering is the practice of designing, building, and operating the harness: the system around the model that determines whether the model's capability is expressed reliably, safely, and within the economic constraints of the organization.

Concretely, this means:

Spec first, test cheap, then execute. Before you run an expensive agent workflow against a production model, you write a behavioral specification. What is this agent supposed to do? What are the explicit limits — topics, actions, spend? What does correct output look like, and how will you measure it? Then you test against a smaller, cheaper model that approximates the behavior of the target model. You run a thousand test cases for the cost of ten production runs. You find the failure modes before they cost real money. You execute against the production model only after the spec is validated.

This is not a novel methodology. Software engineers have applied exactly this logic — write the spec, write the tests, run the suite against a stub, then integrate — for forty years. AI agents are not exempt from engineering discipline. They are subject to it in precisely the same way, with the added wrinkle that the "test" here is not a deterministic pass/fail — it is a statistical distribution over outputs, and your guardrails need to bound that distribution acceptably.

Token budgets as first-class engineering constraints. In a well-run AI system, token spend is a design parameter, not a surprise. You set a budget per workflow, per user, per day. You instrument every call. You route simple tasks to small models and complex tasks to large ones. You cache aggressively. You audit spend the way a finance team audits headcount — regularly, with variance analysis, with owners responsible for overruns.

The Uber and Microsoft cases are not edge cases — they are what you get when you deploy without this discipline. Eighty-four percent adoption is a success metric. A budget gone in four months is a governance failure. Both things are true simultaneously.

Human-in-the-loop by design, not by accident. The Air Canada case, the Chevrolet case, the delivery company case — all of them are systems that were designed without explicit human oversight at decision boundaries. An AI agent that can commit the organization to a financial offer, a refund, or a customer communication without a human checkpoint is not a reliability-engineered system. The judgment call about where to insert human review is a design decision, and it should be made consciously and documented, not discovered during an incident.

For the engineers reading this who are worried about the narrative of displacement: you were never the cost. You are the cure. The discipline that makes AI deployments work — the specification, the evaluation, the guardrails, the cost governance, the incident response — is engineering work. It requires the kind of judgment about system design and operational reality that models do not yet provide from the inside.

The companies that figure this out first will not be the ones with the most capable models. They will be the ones with the best harnesses around their models. That is where the durable competitive advantage lives. And it is where your skills — if you redirect them toward this discipline — become more valuable, not less.

You can explore what this practice looks like in depth at varunpratap.com, where I document the research and methods behind the Qualixar toolkit.

The Real Fix: World Models

There is a deeper problem that the harness disciplines above do not fully solve, and it is worth being direct about it.

Current LLMs, including the best ones available today, generate outputs by predicting token sequences. They are very good at this. What they do not have is an internal simulation of reality — a model of cause and effect, of physics, of how actions play out over time in the real world before those actions are taken. When an AI agent makes a decision, it is not simulating the consequences of that decision in a world model and selecting the action whose simulated outcome best serves the goal. It is generating the action that fits the statistical pattern of what comes next in the context.

This creates a class of failure that guardrails and cost governance do not eliminate: the agent takes the locally coherent action that is globally wrong, because it has no mechanism to simulate forward and check. The Air Canada bot did not hallucinate the refund policy because it was poorly prompted. It generated text that fit the pattern of "customer asks about refund policy, assistant provides helpful information" without any simulation of whether that policy actually existed in the real world.

World models are AI systems that build and maintain an internal simulation of reality and plan actions by simulating their consequences before executing. Two labs working in this space have reportedly raised significant funding — the order of magnitude is over a billion dollars each — to build AI that can reason about the real-world consequences of its actions before taking them.

If this class of system matures, it changes the cost and reliability calculus significantly. A world model that can simulate the cost of a thousand token paths before executing one can do its own spec-first optimization. A world model that understands causal consequence before action can prevent a class of reliability failures that current guardrails address only partially.

The caveat: this is where the timeline remains genuinely uncertain. The funding is real. The research direction is credible. The production-grade systems do not yet exist at the scale needed to replace the harness discipline described above. For the next several years, AI Reliability Engineering is not a transitional role waiting to be automated away. It is the practice that makes the current generation of AI systems viable in production while the next generation matures.

The real fix, long term, is world models. The real fix, right now, is the harness.

What To Do This Week

You do not need a budget, a title, or permission to start. The skill that separates the survivors from the laid-off is buildable on your own machine, and the whole stack maps to one discipline: AI Reliability Engineering. Here is the order I would run it in.

1. Learn what a token actually costs you. Take any task you would hand an AI agent and meter it. Count the tokens in, the tokens out, the retries. Most engineers have never once looked at the bill for a single run. Microsoft and Uber did not look until the annual budget was already gone. You manage what you measure; start measuring.

2. Build the harness, not the demo. A raw model is an engine, not a car. Around any model you ship, wrap the four pieces that turn confident fabrication into a reliable system: tools it can call, your private data through retrieval so it answers from facts instead of guessing, guardrails that catch the outputs you cannot allow, and a human in the loop at the point where a wrong answer is expensive. A demo skips all four. A product has all four.

3. Adopt spec-first execution. Do not point an agent at a goal and hit run. Diagnose first, write the plan, then test that plan cheaply — let it fail ten, fifty, a hundred times on small token budgets until it is solid. Only then spend the big tokens on real execution. Most teams do the opposite: they execute blind and pay to fail in production. Plan with tokens, test with tokens, then execute.

4. Treat tokens the way a finance team treats money. Measured, budgeted, optimized. Set a per-task ceiling. Alert when a run exceeds it. Track cost per successful outcome, not cost per call. This is token management, the twin of reliability engineering, and right now it barely exists as a named role — which is exactly why it is an opportunity instead of a commodity.

5. Stay the human who directs it. The exposure data is clear that the most affected roles are senior, digital, and well paid — and the demand data is just as clear that the same experience is what the new work requires. The engineers everyone is writing off were never the cost. They are the cure — but only the ones who upskill into directing the system instead of competing with it.

That is a stack you can start this week, on the AI you already have access to, with no permission required. The unwinding is real. So is the opening it creates. Build the skills almost nobody else is building.

Don't trust. Verify. Even me — every number in this piece is sourced below.

Sources & Receipts (don't trust — verify)

Accenture's worst day (~18%) and the ₹1.35 lakh crore Indian IT selloff, June 19 2026 — FT, CNBC, Business Today.
Outsourcing bookings down 15%; management said clients are reallocating existing budgets, not adding new money — Accenture earnings call, FT, CNBC.
More than 42,000 jobs gone across TCS, Infosys, Wipro and HCL since 2023; Infosys' first annual drop since 2001, TCS' first in 19 years — company filings, Business Today, Moneycontrol.
Andrej Karpathy's AI job-exposure scoring (later deleted); digital/white-collar work most exposed, hands-on trades safest — Dwarkesh Podcast (Oct 2025), X.
NITI Aayog worst-case: India's tech-services workforce shrinks from ~8 million to ~6 million by 2031 — NITI Aayog, Economic Times.
MIT: 95% of enterprise GenAI pilots show no measurable P&L return — MIT Project NANDA, 2025.
Gartner: 40% of agentic-AI projects cancelled by 2027 — Gartner, 2025.
Microsoft gave engineers Claude Code, then cancelled it over token cost and moved to Copilot CLI (~Jun 30) — The Verge (Tom Warren), Windows Central.
Uber: Claude Code on 5,000 engineers, 84% adoption, power users up to $2,000/month, 2026 AI budget gone in four months — Bloomberg, TechCrunch, Fortune.
Token/inference prices fell roughly 60–80% (2025→2026), yet total AI bills still rose — provider pricing trackers, Artificial Analysis.
Satya Nadella: every company must build "human capital and token capital" (post seen by 28M+, ~Jun 14 2026) — X, Yahoo Finance, Stocktwits.
One new AI data center can draw a nuclear plant's worth of power; Microsoft has chips it cannot deploy for lack of power; the $500B Stargate project cancelled a Texas (Abilene) site — Bloomberg, DataCenterDynamics, TechSpot; Nadella and CFO Amy Hood.
India's AI talent gap — roughly 1 million AI professionals needed by 2027, fewer than 500,000 qualified today — NASSCOM, Economic Times.
Documented agent failures from missing guardrails: Air Canada's refund-policy bot, a Chevrolet dealer bot talked into a $1 SUV, a delivery firm's abusive bot — BBC, The Guardian, The Register.
World models — two labs reportedly raised over a billion dollars each to build AI that simulates reality before acting — tech press, reported and not independently confirmed here; treat as directional.

Watch the full 12-minute film: The Great AI Unwinding. More on the practice of making AI reliable and affordable in production at varunpratap.com.