Kubernetes Tip: Requests & Limits (Without Surprises)
How CPU/memory requests and limits actually affect scheduling, throttling, OOMKills, and autoscaling.
Resource settings are one of the fastest ways to make a cluster feel “stable” (or mysteriously broken).
TL;DR
- Set requests for every container (CPU + memory). This is what the scheduler uses.
- Be careful with CPU limits: CPU limits can cause throttling under load.
- Memory limits are important: without a memory limit, a container can consume node memory and destabilize neighbors.
- HPA looks at requests (by default). Wrong requests => wrong autoscaling decisions.
What the scheduler really uses
The scheduler places Pods based on requests, not limits.
If you omit requests:
- CPU request defaults to 0 (effectively “best effort”)
- Memory request defaults to 0
That makes the Pod easy to schedule, but it can also lead to noisy-neighbor issues.
A sane baseline YAML
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
memory: "512Mi"
Why this baseline:
- You get predictable scheduling (requests set).
- You avoid node-wide memory pressure (memory limit set).
- You avoid CPU throttling surprises by not setting a CPU limit initially.
CPU limits and throttling
CPU limits are enforced by CFS quota. If you set a CPU limit too low:
- Requests can be met, but bursty workloads will get throttled.
- Latency spikes show up even though “CPU usage looks fine”.
When you do set CPU limits, try to keep:
limits.cpu>=requests.cpu(always)- and preferably not too close for bursty apps
Memory limits and OOMKills
Memory is different: if a container exceeds its memory limit, the container is typically OOMKilled.
If you see restarts with:
kubectl describe pod <pod> | rg -n "OOMKilled|Killed"
Consider:
- increasing
limits.memory - optimizing memory usage
- enabling a VPA recommendation loop (even if you don’t auto-apply)
Link to autoscaling behavior
HPA (CPU utilization) typically uses:
current CPU usage / CPU requests
So if CPU requests are too high:
- HPA thinks utilization is low → scales too late
If CPU requests are too low:
- HPA scales too aggressively
Quick audit commands
List containers missing requests/limits:
kubectl get deploy -A -o json | jq -r '
.items[]
| .metadata.namespace as $ns
| .metadata.name as $name
| .spec.template.spec.containers[]
| select(.resources.requests == null or .resources.requests.cpu == null or .resources.requests.memory == null)
| "\($ns)/\($name) container=\(.name) missing requests"
'
Checklist
- Every container has CPU+memory requests
- Every container has a memory limit
- CPU limits only where you truly need them
- Requests match reality (HPA/VPA alignment)
Requests, limits, and QoS classes (why it matters)
Kubernetes assigns each Pod a QoS class based on the resources you set. QoS affects eviction priority when a node is under pressure (especially memory pressure).
QoS classes in practice
- Guaranteed: Every container sets CPU+memory requests and limits, and
requests == limitsfor both. - Burstable: You set at least some requests/limits, but the Pod is not Guaranteed.
- BestEffort: No requests/limits set at all.
What this means operationally:
- When a node hits memory pressure, BestEffort Pods are typically evicted first, then Burstable, and Guaranteed last.
- If you want a component to be “reliable”, don’t let it accidentally fall into BestEffort.
Quick check:
kubectl get pod -n <ns> <pod> -o jsonpath='{.status.qosClass}{"\n"}'
CPU: scheduling vs throttling (and why limits can hurt latency)
CPU is “compressible”: if there isn’t enough CPU, processes usually slow down rather than crash.
- Requests.cpu affects placement (bin packing): the scheduler uses requests.
- Limits.cpu affects runtime enforcement: CPU quota can throttle the container.
This is why a common production baseline is:
- set CPU requests for every container
- avoid CPU limits unless you have a clear reason (hard fairness, strict multi-tenancy policies, or capacity control)
A real-world symptom of too-low CPU limits
Your application is “healthy” but p95 latency spikes during traffic bursts. Metrics show CPU usage “below the limit”, yet response time is bad. Often the missing piece is CPU throttling (CFS quota), which may not be obvious unless you monitor throttling counters.
If you must set CPU limits, consider:
limits.cpu>=requests.cpu(always)- enough headroom for bursty code paths, GC, TLS handshakes, and cold caches
Memory: protect the node, but design for OOMKills
Memory is not compressible: going above a memory limit can kill the container.
Treat memory limits as:
- a safety boundary for the node and other workloads
- a failure scenario your app must tolerate (restarts, warmups, cache rebuilds)
Language runtime alignment
If your runtime can “think” it has more memory than the container allows, you’ll get surprises:
- JVM: align
-Xmxwithlimits.memory(leave headroom for native memory) - Node: align
--max-old-space-size - Go: memory is dynamic; still watch peak usage and tune caches/buffers
Node allocatable and “why did the node OOM?”
Nodes have:
- Capacity: physical resources
- Allocatable: what Kubernetes offers to Pods after reserving for OS/system
You can be “within requests” but still destabilize a node at runtime if:
- requests are far below real usage (aggressive overcommit)
- multiple workloads spike together
- the node is running heavy DaemonSets (logging, monitoring, service mesh)
Inspect allocatable and current allocations:
kubectl describe node <node> | rg -n "Allocatable|Allocated resources"
kubectl get events -A --sort-by=.lastTimestamp | rg -n "Evicted|MemoryPressure|OOM"
Sidecars and initContainers: the hidden tax
Many production Pods include sidecars:
- service mesh proxies (Envoy)
- log forwarders
- security agents
These containers need their own requests/limits. If you size only the “main” container but ignore sidecars, the Pod’s total resource footprint can be significantly larger than expected.
Also remember:
- initContainers can be CPU/memory heavy (migrations, warmups, downloads)
- they affect startup time, scheduling, and can cause node pressure during large rollouts
Guardrails: LimitRange and ResourceQuota
To prevent “someone forgot requests” from entering production, many teams enforce guardrails:
LimitRange (defaults + min/max)
LimitRange can provide defaults and also enforce min/max per container. This is useful when teams forget to set requests (which leads to BestEffort/Burstable chaos).
apiVersion: v1
kind: LimitRange
metadata:
name: defaults
namespace: app
spec:
limits:
- type: Container
defaultRequest:
cpu: 100m
memory: 128Mi
default:
memory: 512Mi
min:
cpu: 25m
memory: 64Mi
max:
cpu: "2"
memory: 4Gi
ResourceQuota (namespace budget)
ResourceQuota prevents a namespace from consuming unlimited cluster capacity and can enforce request/limit usage.
How to pick numbers (a practical sizing loop)
Think in iterations:
- Start simple: set CPU+memory requests; set memory limits; skip CPU limits initially.
- Measure in real traffic conditions:
- peak memory (p95/p99)
- CPU usage and throttling (if limits exist)
- restart/OOMKilled rate
- HPA behavior (scale timing and stability)
- Adjust:
- requests too high => slow scaling + wasted bin packing
- requests too low => noisy neighbors + unstable autoscaling
- memory limit too low => OOMKills; too high => reduced density
- Automate recommendations:
- VPA is excellent for recommendations even if you don’t auto-apply
A quick decision table
| Goal | CPU request | CPU limit | Memory request | Memory limit |
|---|---|---|---|---|
| General web service | required | optional | required | required |
| Strict multi-tenant | required | required | required | required |
| Batch job | required | optional | required | required |
Final rule of thumb
If you do only one thing:
Set requests for CPU+memory, set memory limits, and only set CPU limits when you can measure and accept the throttling trade-offs.
References
- Resource management for Pods and containers
- Configure Quality of Service for Pods
- Node-pressure eviction
FAQ
Q: Should requests equal limits? A: Only if you need strict guarantees. For most services, Burstable (requests < limits) is fine.
Q: Why do Pods get OOMKilled? A: The memory limit is too low for peak usage. Raise limits or reduce memory spikes.
Q: Do limits affect scheduling? A: Scheduling uses requests. Limits affect runtime enforcement and eviction pressure.