What is Tokenmaxxing?
Tokenmaxxing means treating token consumption as a measure of AI productivity — the more tokens burned, the better AI adoption. In early 2026, Meta, Amazon, and Microsoft set internal token usage targets. Spending increased, but results did not.
It's the nightmare scenario: a company that thought Claude would boost its productivity was hit with a half a billion dollar bill for a single month of usage.
The industry has been talking about tokenmaxxing for a couple of months now. Teams are scaling AI usage faster and faster, pushed to burn as many tokens as possible to see what new models can do.
At Meta, where "AI-driven impact" is now a core expectation in performance reviews, an employee-built leaderboard ranked all 85,000 employees by token consumption.
But the reckoning has begun:
- Uber's CTO burned through the company's entire 2026 AI budget in four months.
- Microsoft canceled most of its Claude Code licenses six months after rolling them out.
- Amazon shut down its own leaderboard, with an SVP telling staff, "Please don't use AI just for the sake of using AI."
The tokenmaxxing era is ending. As leaders, we need to understand that burning tokens does not equal successful engineering. To stay ahead of this AI budget crisis, we must build operational discipline into processes if we want to see any return on what we're spending.
TL;DR
- 98% of FinOps practitioners now manage AI spend, up from 31% just two years ago (2026 State of FinOps)
- KPMG projects average enterprise AI spend will nearly double to $200M next year; Goldman Sachs found large companies are already overrunning budgets by orders of magnitude
- Uber's CTO burned through the company's entire 2026 AI budget in four months
- Token volume is the wrong metric — the metric that matters is efficacy per call: what did each model call cost, and what did it return?
Why Teams Started Optimizing for Tokens
Teams optimized for token consumption because it was the only AI metric that was easy and fastest to track. Nvidia's Jensen Huang put it plainly in March at GTC: "If that $500,000 engineer did not consume at least $250,000 worth of tokens, I'm going to be deeply alarmed."
For a company like Salesforce, that looks like spending $300 million on Anthropic tokens this year.
By measuring engineering productivity with tokens, there’s no incentive to monitor usage. When teams were shipping from the most hyped frontier model, token burn didn’t matter, because it was a race to see who could build what the fastest.
But this is when the cracks started to show, and teams started to realize the costs piling up. A model that was obvious at build time could now cost twice as much as a newer alternative and score lower on accuracy.
Tokenmaxxing suddenly became tokenomics.
Why AI Budgets Are Exploding
Enterprise AI budgets are exploding because adoption outpaced governance. Goldman Sachs found large companies are already overrunning their AI budgets by orders of magnitude, and KPMG expects average enterprise AI spend to nearly double to $200 million next year.
A Gartner economist said it plainly: unconstrained deployment of agents is not a viable strategy. According to the 2026 State of FinOps, 98% of practitioners said they now manage AI spend, up from 31% just two years ago.
The result is predictable: AI adoption has moved faster than AI operations.
The market's first response to catch up has been to use dashboards to get any visibility into token spend. Every major observability and cost platform track by model, team, and token type.
But tokens are just the visible part of the cost. When an agent executes a task, it's not just consuming input and output tokens. It may spin up a sandbox VM, hit a key-value cache, or trigger a RAG pipeline — none of which shows up in token cost data.
I've been seeing this play out in our own customer and prospect meetings. Leaders are scrambling for solutions while under pressure to do more with less. But when faced with the token sticker shock, engineering leaders do one of two things:
- Find a number that justifies the cost.
- Triage.
When mere months ago the narrative was “AI at all costs,” now leaders are scrambling to find any metric that justifies those ballooning costs. Sure, your review and PR rates are increasing or your org created handful of new skills and workflows. But those metrics can only justify spend if you can tie it back to that spend.
Which model drove the PR rate improvement? Which workflow? At what cost per outcome? As we come out of the tokenmaxxing era, it’s becoming essential for orgs to be able to tie back specific model cost metrics to their outcomes.
Nobody can answer that because nobody was tracking it at that level. So the metric proves the spend happened. It doesn't prove it was worth it.
When it comes to triaging, its reactive nature only gets caught up in the break-neck pace of the evolving AI space. If you’re hit with a million dollar bill one quarter, by the time you’ve found the issue and implemented a new plan, model prices will have changed, and the budget you planned for is already burned.
Either way, you’re trying to optimize off old, stale data. Instead, we need to integrate into the lifecycle of these models, optimizing as soon as data comes in.
What to Measure Instead of Token Consumption
The metric that matters when measuring token consumption is efficacy per call. If a $10 model produces the same outcome as a $100 model on a given workflow, that's a 90% reduction with zero loss. But you can only know that if you're measuring at the call level, not the total monthly consumption.
The efficiency measurement unlocks controls that are obvious in hindsight:
- QA environments don't need frontier models
- When production workflows need regional fallbacks, an agent can reroute if a provider rate-limits you
- Specific models get approved for specific environments and workflows, and blocked everywhere else
None of this new; it's the same discipline we've always applied to compute, just in a new layer.
After Amazon took down its token leaderboard, it replaced raw token counts with a metric it calls "normalized deployments": AI-assisted code that actually ships, not tokens consumed. It was a clear sign that the industry is moving towards measuring output to inform behavioral trends.
But we need to go further than this: Teams that apply infrastructure-level observability to their LLM calls find and curb waste immediately.
We saw this firsthand while building our own IaC agents at Sedai and turned on agent observability. Within days we found workflows using more expensive models than needed, unnecessary retries, and execution paths consuming far more resources than expected.
A third of our LLM costs were optimizable, and it was a real wake-up call for us to see how much waste was simply invisible.
What the Cloud Era Teaches Us About AI Cost Management
As far as I can tell, pricing will stay token-based, and costs are going to continue to climb. While AI providers have been eating compute costs to buy market share, GitHub, OpenAI, and Anthropic are already raising prices as they head toward profitability. Anthropic is even set to IPO this fall.
GitLab's restructuring memo put a number on where this is headed: "Last year, the developer platform market used to be measured in tens of dollars per user per month. This year it is hundreds/user/month and headed to thousands."
Gartner goes further, claiming by 2028, AI coding costs will surpass the average developer's salary as token consumption surges under consumption-based licensing.
Teams budgeting at today's rates are walking into a budget crisis if they’re not already there yet.
We're entering the same phase of AI that cloud infrastructure entered more than a decade ago. The companies that won in the cloud era weren’t the ones who consumed the most compute; they were the ones that could tie spending to real outcomes continuously.
The tokenmaxxing era was useful because it taught organizations what was possible. But token consumption is a terrible long-term KPI. What matters is what those tokens bought.
Every technology wave eventually develops operational discipline. The party is over, and it’s time to face reality. We don’t know how the AI companies are going to keep changing their pricing, which models will IPO, or what architecture will be obsolete in six months.
What we do know is that engineering teams need to stop optimizing for token consumption and start optimizing for cost per outcome. That means:
- Measuring performance at the call level
- Comparing costs against outcomes
- Continuously optimizing as conditions change
Tokenomics is just emerging. Just as cloud cost management was built on infrastructure observability, AI cost management will be built on agent observability. It's time for engineering teams to start building those feedback loops now.
Sedai does tokenomics for you. See how.

