This week, the trend is impossible to ignore: LLM costs are out of control.
AI companies have been essentially subsidizing the cost of using LLMs, focusing on long-term growth over short-term profits. But that era seems to be nearing its end. From Anthropic’s $45B compute deal to one AWS user’s surprise $30k bill after using Claude, the era of subsidized AI is over.
I asked our engineering leadership team at Sedai for their take. Here's what they said.
AI Labs Are Subsidizing Your Usage… For Now
Nikhil Gopinath Kurup, SVP of Engineering, ML
There seems to be a genuine disconnect between the actual cost of inference and what these packaged products or APIs charge. It reminds me of the early days of $5 Uber rides, when riders got subsidized fare to build ridership habits and capture the market.
Currently, subscription-based pricing models offer significantly better value than API pricing: a Claude subscription at around $100 provides token access that would easily cost ~$2,000 via the API.
By contrast, the pricing we are seeing from Chinese models on platforms like OpenRouter feels much more grounded in reality, likely because they are aimed at cost recovery. It seems like a more accurate reflection of what inference should cost, even if those models do not yet fully match the capabilities of the top US offerings.
"The pricing US labs charge has almost nothing to do with what inference actually costs."

Nikhil Gopinath Kurup
SVP of Engineering, ML
I would expect inference costs to decrease over time, but the major US labs do not seem to be following that trajectory just yet. Providers like OpenAI and Anthropic have massive capital investments to recoup, which means they will eventually face intense pressure to demonstrate substantial revenue and growth. Google seems to be much more vertically integrated than its direct competitors, but time will tell how that advantage plays out.
Additionally, recent discussions around API throttling — like running into Claude Code limits — reiterate just how vital reliable API access and LLM inference have become for developer productivity and operational workflows.
It is a fascinating dynamic to watch unfold. Will costs continue to climb as frontier models become more complex, or will they eventually drop as efficiency improves? While the ultimate direction remains uncertain, the outcome is bound to create both significant opportunities and equally notable challenges.
You’re Using The Wrong Models, And It’s Costing You
Benji Thomas, Co-Founder & CTO
The industry is repeating the early cloud mistake: everyone is defaulting to the latest GPT, Claude, or Gemini models just to move fast, without thinking enough about long-term cost efficiency.
Right now, pricing still feels heavily subsidized. Once teams become dependent on a model/provider, they’re effectively locked in, and invoice spikes will start becoming normal. Even when users have cost monitoring set up, the way these models bill can bypass existing controls entirely, leaving you with an unexpected $30k bill.
"The teams investing in dynamic model selection will now have a huge advantage."
.webp%3Fv%3D2026-04-09T18%253A41%253A05.143Z&w=3840&q=75&dpl=dpl_Aq7rkkNqSBhH83dyhJZrPyjnUo2z)
Benji Thomas
Co-Founder & CTO
The bigger mistake is treating model selection like a one-time decision. It’s not, and this space is moving too fast for that.
Over the next few years, we’ll see an explosion of frontier models, open-source models, and highly specialized models. The teams investing in dynamic model selection and routing now will have a huge advantage later.
Everyone else will be trying to optimize after they’re already 10 steps behind.
LLM Costs Are Creating An Engineering Crisis
Hari Chandrasekhar, SVP of Engineering, Core
These stories about skyrocketing LLM costs are the logical conclusion of how we're building right now.
We all agree that leveraging AI for productivity is non-negotiable. But in the rush to build, the industry has bypassed basic engineering discipline. Right now, engineers are defaulting to the most expensive models for basic tasks, creating a massive mismatch between capability and cost.
"Token costs don't scale linearly. The moment you move to agentic workflows, your context window explodes, and costs compound."

Hari Chandrasekhar
SVP of Engineering, Core
What’s worse, token-based consumption is completely misunderstood because it doesn't scale linearly. The moment you move from a simple prompt to agentic, multi-step workflows, your context window explodes. Agents looping, pulling RAG context, and calling other systems means token costs compound geometrically behind the scenes.
Standard cloud cost tools are blind to this. We can't just look at the bill at the end of the month and panic. If we want to actually use the best, most dynamic models for the job, we need programmatic governance at the orchestration layer, dynamically routing to the right model, pruning context aggressively, and enforcing spend controls before the API call is even made.
LLM Costs Are an Engineering Problem, Actually
It’s obvious that when it comes to managing these out of control LLM costs, most teams are flying blind. As HackerNoon noted this week, engineers pick Claude or GPT-4 because they know it works, but they don’t necessarily know if it’s the right LLM for the tasks they have.
And the model math is getting increasingly complex, too: just because you’re using the cheapest model, doesn’t mean it’s actually cheaper. One call to a frontier model often beats five calls to something cheaper.
"Cost constraints must be engineered into architecture from day one."

Suresh Mathew
CEO
But the core issue isn’t just choosing the right model. The real issue is that cost governance is still treated as a billing problem rather than an engineering problem. What’s the use of getting cost alerts once you’ve hit your budget? You just end up reacting to a bigger problem.
Engineers should instead bake cost constraints into their architecture from day one. I like to think about it in the same way we design services: you set latency budgets and build in reliability before you ship. Managing LLM costs should work the same way, by defining your limits upfront and enforcing them automatically.
Sedai is helping our customers lower their AI costs. Meet with us to learn more!
