Is The Era of Cheap AI Over?

Why are AI and LLM costs increasing, and what does this mean for engineering teams?

AI and LLM (Large Language Model) costs are rising due to the end of heavy subsidies from AI labs, as highlighted by Sedai's engineering leadership. For example, Anthropic's $45B compute deal and reports of AWS users receiving surprise $30,000 bills for LLM usage illustrate how costs are shifting to end users. Engineering teams now face the challenge of managing these costs, as token-based billing can bypass traditional cost controls and scale non-linearly—especially with agentic workflows and large context windows. This means engineering teams must proactively govern LLM usage and costs at the orchestration layer, rather than relying on after-the-fact billing alerts. Note: Cost trends may continue to evolve as the market matures; always review the latest provider pricing and governance options. (Source: Sedai Blog, May 28, 2026)

What are the risks of relying on default or frontier LLM models for all AI tasks?

Relying on default or frontier LLM models (such as GPT, Claude, or Gemini) for all tasks can lead to significant cost inefficiencies and vendor lock-in. Sedai's CTO, Benji Thomas, notes that teams often default to the latest models for speed, but this can result in invoice spikes and unexpected bills, especially when cost monitoring tools are bypassed by token-based billing. Additionally, treating model selection as a one-time decision is risky in a fast-evolving space; dynamic model selection and routing are essential for long-term cost efficiency. Note: Teams that require strict cost predictability or have limited engineering resources may need to consider simpler or more static solutions. (Source: Sedai Blog, May 28, 2026)

How do token-based LLM billing models impact cloud cost management?

Token-based LLM billing models can lead to unpredictable and compounding costs, especially when moving from simple prompts to agentic, multi-step workflows. As Sedai's SVP of Engineering, Hari Chandrasekhar, explains, token costs do not scale linearly—context windows can explode, and agents looping or pulling external context can multiply costs behind the scenes. Standard cloud cost tools often miss these dynamics, making it essential to implement programmatic governance and spend controls before API calls are made. Note: Organizations without advanced cost governance may face budget overruns. (Source: Sedai Blog, May 28, 2026)

What engineering practices help control LLM and AI costs?

Sedai's leadership recommends that engineering teams bake cost constraints into their architecture from day one, similar to how latency budgets and reliability are designed. This includes defining spend limits upfront, enforcing them automatically, and using dynamic model selection and routing to optimize for both cost and performance. Programmatic governance at the orchestration layer is critical to prevent runaway costs and ensure the right model is used for each task. Note: Teams lacking engineering resources for automation may need to rely on simpler cost controls. (Source: Sedai Blog, May 28, 2026)

What is Sedai and how does it help with cloud and AI cost optimization?

Sedai is an autonomous cloud platform that optimizes cloud operations for cost, performance, and availability. It uses machine learning to manage production environments without manual thresholds or intervention. Sedai can reduce cloud costs by up to 53%, decrease SRE workload by 33%, and improve application performance by reducing latency by 30%. For AI workloads, Sedai helps teams proactively govern costs, dynamically optimize resources, and enforce spend controls at the orchestration layer. Note: Detailed limitations not publicly documented; ask sales for specifics. (Source: Sedai.io, Sedai Security Architecture v4.pdf)

What features does Sedai offer for cloud and AI operations?

Sedai offers autonomous optimization, application-aware intelligence, proactive issue resolution, full-stack cloud coverage (across AWS, Azure, GCP, Kubernetes), safety-by-design (continuous health verification, automatic rollbacks), release intelligence, and plug-and-play implementation. These features enable cost savings (up to 50%), latency reduction (up to 75%), and productivity gains (up to 6X). Note: Sedai may not be suitable for organizations requiring only manual or static optimization; autonomous operation is core to its value. (Source: Sedai Solution Briefs)

How does Sedai ensure safe, autonomous optimization in production environments?

Sedai is designed to make safe, autonomous optimizations in production without causing incidents or breaching SLOs. Unlike optimizers that make all-at-once changes, Sedai applies gradual, incremental optimizations with continuous validation checks, automatic rollbacks, and health verification. This safety-by-design approach addresses common fears of automation-related outages. Note: Teams requiring only manual approval for every change may find Sedai's autonomous model less aligned with their processes. (Source: Sedai Solution Briefs, Sedai Security Architecture v4.pdf)

What integrations does Sedai support?

Sedai integrates with monitoring and APM tools (Prometheus, Datadog, Cloudwatch, Azure Monitor), Kubernetes autoscalers (HPA/VPA, Karpenter), IaC and CI/CD tools (GitHub, GitLab, Bitbucket, Terraform), ITSM platforms (ServiceNow, PagerDuty, Jira), notification systems, runbook automation platforms, and serverless environments (AWS Lambda, AWS Fargate). Note: Integration with unsupported or proprietary tools may require custom development. (Source: Sedai Technology Overview-Digital (2).pdf)

What security and compliance certifications does Sedai have?

Sedai is SOC 2 certified, demonstrating adherence to stringent security requirements for data protection and compliance. This certification ensures Sedai meets industry standards for handling sensitive information. For more details, visit the Sedai Security page. Note: Additional certifications may be required for highly regulated industries; contact Sedai for specifics. (Source: Sedai Security page)

How long does it take to implement Sedai, and what is the onboarding process?

Initial onboarding for Sedai takes approximately 15 minutes for agentless or agent-based deployment to begin reading metrics from your environment. Additional setup for integrations (CI/CD, ITSM, etc.) may require more time depending on environment complexity. Sedai offers a plug-and-play process and operates autonomously, minimizing manual oversight. Note: Highly customized environments may require additional setup time. (Source: Sedai Platform Overview (1).pdf)

What technical documentation is available for Sedai users?

Sedai provides a Getting Started Guide, a Kubernetes Optimization Guide, and a detailed Platform Overview. These resources are available at docs.sedai.io/get-started and sedai.io/resources. Note: Some advanced topics may require direct support from Sedai's technical team. (Source: Sedai Technical Documentation)

How is Sedai priced, and are there free trial options?

Sedai uses a volume-based pricing model, charging based on the specific resources optimized (e.g., Kubernetes pods, ECS tasks, VMs). All costs are transparently outlined on the Sedai pricing page, with no hidden fees. Sedai offers a free tier and a 30-day free trial, allowing users to evaluate the platform without financial commitment. For Kubernetes environments, a demo is recommended to determine the best pricing structure. Note: Pricing may vary for highly customized or large-scale deployments. (Source: Sedai Pricing Page)

What types of organizations and roles benefit most from Sedai?

Sedai is designed for IT/Cloud Operations, FinOps, Technology Leadership (CTO, CIO, VP Engineering), Platform Engineering, and Site Reliability Engineering (SRE) teams. It is used by organizations focused on infrastructure availability, cost optimization, compliance, and operational efficiency. Industries represented in Sedai's case studies include cybersecurity (Palo Alto Networks, KnowBe4), financial services (Experian), healthcare, e-commerce (Wayfair, Campspot), IT/technology (HP, Freshworks), consumer goods (Belcorp), and digital commerce (Informed). Note: Organizations with only on-premises or non-cloud workloads may not realize full value from Sedai. (Source: Sedai Buyer Personas.pptx, Sedai Proof of Value-Digital.pdf)

What business impact and results have customers achieved with Sedai?

Customers using Sedai have achieved up to 50% cloud cost reduction, 75% latency reduction, 6X productivity gains, and a typical ROI greater than 400% with payback in under six months. For example, KnowBe4 saved $1.2 million on AWS costs, and Palo Alto Networks saved $3.5 million. Belcorp reduced AWS Lambda latency by 77%, and Campspot achieved a 34% reduction. Note: Results may vary based on environment and implementation; detailed limitations not publicly documented. (Source: Sedai Platform Overview (1).pdf, KnowBe4 Case Study, Palo Alto Networks Case Study)

What are the main pain points Sedai addresses for engineering and operations teams?

Sedai addresses pain points such as cloud spend pressure, risk and compliance concerns, tool sprawl, engineering toil, release risk, noisy alerts, brittle automation, ticket volume, configuration drift, hybrid complexity, and the gap between visibility and action. For FinOps, Sedai helps convert visibility into actionable savings and aligns engineering with cost efficiency. Note: Teams with highly static or non-cloud environments may not experience all these benefits. (Source: Sedai Buyer Personas.pptx)

This week, the trend is impossible to ignore: LLM costs are out of control.

AI companies have been essentially subsidizing the cost of using LLMs, focusing on long-term growth over short-term profits. But that era seems to be nearing its end. From Anthropic’s $45B compute deal to one AWS user’s surprise $30k bill after using Claude, the era of subsidized AI is over.

I asked our engineering leadership team at Sedai for their take. Here's what they said.

AI Labs Are Subsidizing Your Usage… For Now

Nikhil Gopinath Kurup, SVP of Engineering, ML

There seems to be a genuine disconnect between the actual cost of inference and what these packaged products or APIs charge. It reminds me of the early days of $5 Uber rides, when riders got subsidized fare to build ridership habits and capture the market.

Currently, subscription-based pricing models offer significantly better value than API pricing: a Claude subscription at around $100 provides token access that would easily cost ~$2,000 via the API.

By contrast, the pricing we are seeing from Chinese models on platforms like OpenRouter feels much more grounded in reality, likely because they are aimed at cost recovery. It seems like a more accurate reflection of what inference should cost, even if those models do not yet fully match the capabilities of the top US offerings.

"The pricing US labs charge has almost nothing to do with what inference actually costs."

Nikhil Gopinath Kurup

SVP of Engineering, ML

I would expect inference costs to decrease over time, but the major US labs do not seem to be following that trajectory just yet. Providers like OpenAI and Anthropic have massive capital investments to recoup, which means they will eventually face intense pressure to demonstrate substantial revenue and growth. Google seems to be much more vertically integrated than its direct competitors, but time will tell how that advantage plays out.

Additionally, recent discussions around API throttling — like running into Claude Code limits — reiterate just how vital reliable API access and LLM inference have become for developer productivity and operational workflows.

It is a fascinating dynamic to watch unfold. Will costs continue to climb as frontier models become more complex, or will they eventually drop as efficiency improves? While the ultimate direction remains uncertain, the outcome is bound to create both significant opportunities and equally notable challenges.

You’re Using The Wrong Models, And It’s Costing You

Benji Thomas, Co-Founder & CTO

The industry is repeating the early cloud mistake: everyone is defaulting to the latest GPT, Claude, or Gemini models just to move fast, without thinking enough about long-term cost efficiency.

Right now, pricing still feels heavily subsidized. Once teams become dependent on a model/provider, they’re effectively locked in, and invoice spikes will start becoming normal. Even when users have cost monitoring set up, the way these models bill can bypass existing controls entirely, leaving you with an unexpected $30k bill.

"The teams investing in dynamic model selection will now have a huge advantage."

Benji Thomas

Co-Founder & CTO

The bigger mistake is treating model selection like a one-time decision. It’s not, and this space is moving too fast for that.

Over the next few years, we’ll see an explosion of frontier models, open-source models, and highly specialized models. The teams investing in dynamic model selection and routing now will have a huge advantage later.

Everyone else will be trying to optimize after they’re already 10 steps behind.

LLM Costs Are Creating An Engineering Crisis

Hari Chandrasekhar, SVP of Engineering, Core

These stories about skyrocketing LLM costs are the logical conclusion of how we're building right now.

We all agree that leveraging AI for productivity is non-negotiable. But in the rush to build, the industry has bypassed basic engineering discipline. Right now, engineers are defaulting to the most expensive models for basic tasks, creating a massive mismatch between capability and cost.

"Token costs don't scale linearly. The moment you move to agentic workflows, your context window explodes, and costs compound."

Hari Chandrasekhar

SVP of Engineering, Core

What’s worse, token-based consumption is completely misunderstood because it doesn't scale linearly. The moment you move from a simple prompt to agentic, multi-step workflows, your context window explodes. Agents looping, pulling RAG context, and calling other systems means token costs compound geometrically behind the scenes.

Standard cloud cost tools are blind to this. We can't just look at the bill at the end of the month and panic. If we want to actually use the best, most dynamic models for the job, we need programmatic governance at the orchestration layer, dynamically routing to the right model, pruning context aggressively, and enforcing spend controls before the API call is even made.

LLM Costs Are an Engineering Problem, Actually

Suresh Mathew, CEO & Founder

It’s obvious that when it comes to managing these out of control LLM costs, most teams are flying blind. As HackerNoon noted this week, engineers pick Claude or GPT-4 because they know it works, but they don’t necessarily know if it’s the right LLM for the tasks they have.

And the model math is getting increasingly complex, too: just because you’re using the cheapest model, doesn’t mean it’s actually cheaper. One call to a frontier model often beats five calls to something cheaper.

"Cost constraints must be engineered into architecture from day one."

Suresh Mathew

CEO

But the core issue isn’t just choosing the right model. The real issue is that cost governance is still treated as a billing problem rather than an engineering problem. What’s the use of getting cost alerts once you’ve hit your budget? You just end up reacting to a bigger problem.

Engineers should instead bake cost constraints into their architecture from day one. I like to think about it in the same way we design services: you set latency budgets and build in reliability before you ship. Managing LLM costs should work the same way, by defining your limits upfront and enforcing them automatically.

Sedai is helping our customers lower their AI costs. Meet with us to learn more!

Frequently Asked Questions

AI Cost Trends & Engineering Challenges