Most teams have dashboards. They can see total cluster spend, break it down by namespace, & generate monthly reports. What they can't see is whether it's safe to act on any of it.
You know a service is over-provisioned by 40%. But rightsizing it means answering questions your cost tooling doesn't touch, what happens to p99 latency during the next traffic spike, whether the downstream service can absorb the reduced headroom, & whether the workload's behavior has shifted since the last time anyone load-tested it. That's the work that keeps optimization recommendations sitting in a backlog for months.
This guide covers what the standard Kubernetes cost visibility stack actually gives you, where it stops short, & what's required to turn visibility into safe action, including the GPU visibility challenge that most cost tooling still doesn't address.
In this article:
- The Standard Kubernetes Cost Visibility Stack
- What Actionable Visibility Actually Requires
- GPU Cost Visibility
- How Can Sedai Help You With Kubernetes Cost Visibility?
The Standard Kubernetes Cost Visibility Stack
The instrumentation layer most teams work with starts with Prometheus & kube-state-metrics, often extended with Kubecost or OpenCost for cost allocation. Each layer adds value, but each also carries assumptions that break under real-world conditions.
Prometheus + kube-state-metrics
Prometheus scrapes resource metrics from your cluster. kube-state-metrics exposes the state of Kubernetes objects, including resource requests, limits, & pod status. Together, they give you the raw data to answer questions like "how much CPU did this namespace consume last week?"
Where this stops: Prometheus has no concept of price. It doesn't know that an m5.large in us-east-1 costs $0.096/hour or that your Savings Plan gives you a 40% discount. It's excellent for performance monitoring but doesn't translate resource consumption into dollars without additional tooling.
Kubecost & OpenCost
Kubecost (and its open-source foundation, OpenCost) bridge this gap by ingesting cloud billing data & mapping it to Kubernetes-native objects, namespaces, deployments, labels, & pods. This gives you cost-per-namespace, cost-per-deployment, & cost-per-label breakdowns that finance teams can work with. For a comparison of how different tools handle this, see our guide on Kubernetes cost optimization tools.
Where this breaks: Kubecost splits node cost across pods using an allocation model based on resource requests. When pods don't have requests set, which is common in dev & staging namespaces, the allocation becomes an estimate, not a measurement. Under bin-packing or variable workload patterns, these estimates can diverge
significantly from actual consumption.
The Node-Sharing Problem
When multiple workloads share a node, cost attribution is inherently an allocation model. Two pods on the same node, one CPU-bound & one memory-bound, share the same underlying hardware cost. How you split that cost depends on the allocation logic, by requests, by limits, by actual usage, & each method produces a different number. None of them is objectively "correct."
This matters because cost decisions downstream depend on these numbers being reliable. If your cost-per-namespace data is built on allocation estimates rather than measurements, every optimization decision based on that data carries inherited uncertainty.
Diagnosing Broken Attribution
You can check whether your namespace attribution is reliable by surfacing pods without resource requests set, a leading indicator that Kubecost's cost allocation is estimating rather than measuring.
Run this to find pods in a namespace with no CPU requests:
kubectl get pods -n <namespace> -o json | \
jq '.items[] | select(.spec.containers[].resources.requests == null) | .metadata.name'And this PromQL query to check the ratio of requested CPU to allocatable CPU per namespace, namespaces where this ratio is near zero or missing entirely are the ones where cost attribution is unreliable:
sum by (namespace) (kube_pod_container_resource_requests{resource="cpu", container_id!=""})
/ sum by (namespace) (kube_node_status_allocatable{resource="cpu"})
# Surfaces namespaces where resource requests are unset,
Making Cost Allocation an Estimate Rather Than a Measurement
Missing or inaccurate resource requests don't just affect scheduling — they corrupt the cost attribution data that every downstream optimization decision depends on.
You can verify the scale of the problem in your own cluster with two queries:
sum by (namespace) (kube_pod_container_resource_requests{resource="cpu", container_id!=""})
/ sum by (namespace) (kube_node_status_allocatable{resource="cpu"})Then check the ratio of requested CPU to allocatable CPU per namespace — namespaces with near-zero values are where cost data is unreliable:
sum by (namespace) (kube_pod_container_resource_requests{resource="cpu", container_id!=""})
/ sum by (namespace) (kube_node_status_allocatable{resource="cpu"})
If either of these returns results in your production namespaces, fix resource requests before trusting your cost data. Attribution built on incomplete requests produces misleading conclusions, & optimizing based on misleading data is worse than not optimizing at all.
What Actionable Visibility Actually Requires
Knowing what a workload is consuming and what it costs is the starting point, not the destination. Safe action requires three things that standard observability tooling doesn't provide:
Understanding workload behavior under load, not just at rest. A service running at 30% average CPU utilization looks like a rightsizing candidate. But if that same service spikes to 85% during batch processing windows or during the first 10 minutes after a deployment, rightsizing to the average will cause throttling. You need utilization curves; p95 & p99 behavior over time; not averages.
Knowing the SLO boundaries for each workload. A cost tool can tell you that a service is over-provisioned. It can't tell you whether reducing its allocation will push latency past your p99 SLO. Actionable visibility requires knowing what "safe" means for each specific workload, & that definition varies by service, by traffic pattern, & by time of day.
Understanding what a resource change will do to downstream services. Rightsizing a frontend service might look safe in isolation. But if that service calls a backend that's already running at capacity, reducing the frontend's headroom can cascade into backend latency increases that the cost tool never sees.
This is where cost visibility tooling & observability tooling diverge. Cost tools answer "What does this cost?" Observability tools answer "How does this behave?" Neither answers the question that actually matters for optimization: "What happens if I change this?"
That question requires both datasets combined with an understanding of workload relationships, something that manual analysis can approximate for a handful of services but can't sustain across a production environment with dozens or hundreds of workloads.
Understand Kubernetes Cost Visibility
See how Sedai explains Kubernetes cost visibility in 2026 for clarity, control & cost efficiency.

GPU Cost Visibility
GPU visibility is a different challenge entirely. The metrics exist, but interpreting them correctly requires understanding what each one actually measures, & why no single metric tells you whether a GPU workload is right-sized.
The DCGM Metrics That Matter
NVIDIA's DCGM exporter surfaces GPU telemetry to Prometheus. The three metrics teams look at first:
DCGM_FI_DEV_GPU_UTIL— Percentage of time the GPU was active. This is the metric most teams treat as "GPU utilization," but it's a coarse signal. A GPU can show 90% utilization while only using a fraction of its compute capacity if the workload is memory-bound.DCGM_FI_DEV_MEM_COPY_UTIL— Memory bandwidth utilization. High values indicate the workload is saturating memory transfers between GPU & host.DCGM_FI_DEV_FB_USED— Framebuffer memory consumed. This tells you how much GPU memory the workload has allocated, but not whether that allocation is efficient.
Why No Single Metric Tells the Full Story
A GPU cluster can show high memory usage while compute cores sit nearly idle, meaning you're paying for compute capacity the workload isn't using. Conversely, high compute utilization with low memory bandwidth suggests a compute-bound workload that might benefit from a different GPU type rather than more memory.
The right-sizing question for GPU workloads requires combining these signals:
# Memory pressure vs. compute utilization per workload
# High mem_copy_util + low gpu_util = memory-bound (candidate for MIG or smaller GPU)
# High gpu_activity + low mem_copy_util = compute-bound (different optimization path)
label_replace(
avg by (pod, namespace) (DCGM_FI_DEV_MEM_COPY_UTIL),
"signal", "memory_bandwidth", "", ""
)
or
label_replace(
avg by (pod, namespace) (DCGM_FI_DEV_GPU_UTIL),
"signal", "gpu_activity", "", ""
)This query surfaces the relationship between memory pressure & compute utilization per workload. Memory-bound workloads (high mem_copy_util, low gpu_util) are candidates for MIG partitioning or a smaller GPU type. Compute-bound workloads need a different optimization path entirely.
The MIG Visibility Problem
Multi-Instance GPU (MIG) lets you partition a single GPU into isolated slices, each with dedicated compute & memory. This is powerful for right-sizing, instead of paying for a full A100 for a workload that only needs a fraction of its capacity, you partition the GPU & run multiple workloads on it.
The problem: standard DCGM exporters don't provide MIG-aware visibility out of the box. DCGM_FI_DEV_GPU_UTIL is incompatible with MIG entirely. For MIG-partitioned GPUs, you need DCGM_FI_PROF_GR_ENGINE_ACTIVE instead, which reports per-partition utilization but requires explicit configuration in your DCGM exporter's metrics CSV. Without this, your monitoring shows whole-GPU aggregates that mask per-slice inefficiency.
Getting slice-level cost attribution requires mapping each MIG instance to the pod that's using it, then associating that pod's resource consumption with its namespace & cost center. Most teams don't have this pipeline built, which means GPU cost attribution in MIG environments is a manual exercise, if it happens at all.
How Can Sedai Help You With Kubernetes Cost Visibility?
The pattern across both standard K8s & GPU workloads is the same: teams can see the waste, but acting on it safely requires understanding workload behavior in ways that visibility tooling alone doesn't provide.
Sedai closes that loop. For standard Kubernetes workloads, Sedai learns each service's performance profile, including p95/p99 behavior, SLO boundaries, & downstream dependencies, & autonomously optimizes resource allocation based on that understanding.
It doesn't just flag over-provisioned pods; it right-sizes them continuously while respecting the application's actual performance constraints. Palo Alto Networks runs over 89,000 production changes through Sedai with zero incidents, because the platform understands what's safe to change before changing it.
For GPU workloads, Sedai's proprietary GPU utilization model synthesizes the fragmented DCGM signals into a single true utilization score, solving the measurement problem that keeps most teams over-provisioning GPU capacity.
See how Sedai turns visibility into safe, continuous action.
FAQs
What tools provide Kubernetes cost visibility?
The standard stack is Prometheus with kube-state-metrics for resource data, & Kubecost or OpenCost for cost allocation. Kubecost maps cloud billing data to Kubernetes objects like namespaces, deployments, & pods. For GPU workloads, NVIDIA's DCGM exporter surfaces utilization metrics to Prometheus, though interpreting those metrics correctly requires combining multiple signals.
Why is Kubernetes cost attribution unreliable?
Cost attribution breaks down when pods don't have resource requests set, Kubecost has to estimate allocation instead of measuring it. The node-sharing problem compounds this: when multiple workloads share a node, splitting cost depends on the allocation model used (by requests, limits, or usage), & each produces different numbers. Fixing resource requests across all namespaces is the prerequisite for reliable attribution.
How do you monitor GPU costs in Kubernetes?
Deploy the NVIDIA DCGM exporter to surface GPU metrics to Prometheus. The key metrics are DCGM_FI_DEV_GPU_UTIL (compute activity), DCGM_FI_DEV_MEM_COPY_UTIL (memory bandwidth), & DCGM_FI_DEV_FB_USED (framebuffer memory). No single metric answers whether a GPU workload is right-sized, you need to combine compute & memory signals to determine whether the workload is memory-bound, compute-bound, or genuinely idle.
What is the difference between cost visibility & cost optimization in Kubernetes?
Cost visibility tells you what you're spending & where. Cost optimization acts on that information, rightsizing pods, adjusting autoscaling, reclaiming idle resources. The challenge is that acting safely requires understanding workload behavior under load, SLO boundaries, & downstream dependencies, which visibility tooling doesn't provide. Teams that bridge this require either significant manual analysis or autonomous systems that learn workload behavior before making changes.
