Kubernetes Tip: Debug Pods with Ephemeral Containers
Safely inspect a live Pod without baking debugging tools into production images.
Ephemeral containers let you attach a temporary debugging container to an existing Pod.
This is useful when:
- your app image is minimal (distroless/scratch)
- you need
curl,dig,tcpdump,strace, etc. - you want to keep production images clean
Requirements
- Kubernetes version must support ephemeral containers (GA in newer versions)
- You need RBAC permission to use
pods/ephemeralcontainers
Basic workflow
Attach a debug container:
kubectl debug -n <ns> -it pod/<pod> --image=busybox:1.36 --target=<container-name>
If you don’t need a specific target process namespace, you can omit --target.
What you can do
DNS / connectivity checks
nslookup kubernetes.default.svc.cluster.local
wget -S -O- http://<service>.<ns>.svc.cluster.local:8080/readyz
Inspect env and files (read-only best practice)
- confirm env variables
- validate mounted ConfigMaps/Secrets
- check
/etc/resolv.confand/etc/hosts
Things to remember
- Ephemeral containers do not restart and are not part of the original Pod spec.
- They are meant for debugging, not as a permanent fix.
- In production, log every debug operation (who/when/why).
Suggested RBAC (least privilege idea)
Definition list:
- Ephemeral containers
- Require
updateon the subresourcepods/ephemeralcontainers. - Pods
- Read-only access is usually enough for inspection.
Checklist
- Prefer ephemeral containers over “debug tools in prod image”
- Restrict RBAC to
pods/ephemeralcontainers - Audit debugging actions
Choosing the right debug image
Different debugging tasks need different tools. A few common patterns:
- BusyBox: tiny, good for basic networking (
nslookup,wget,nc). - Alpine: a bit more flexible, can add packages if needed (but beware of network egress restrictions).
- Netshoot-style images: loaded with
curl,dig,tcpdump,mtr, etc. Great for networking, but heavier and potentially riskier.
In production, consider maintaining a blessed debug image:
- pinned by digest (immutable)
- regularly scanned
- minimal but sufficient tools
That gives you consistent, auditable behavior.
Understanding --target (process namespace sharing)
When you specify:
kubectl debug -n <ns> -it pod/<pod> --image=<img> --target=<container>
Kubernetes will try to attach the ephemeral container to the target container’s namespaces (especially the process namespace). This helps when you want to:
- inspect processes
- run tools like
strace(where permitted) - understand what the application is doing in real time
If you omit --target, you still share the Pod network namespace, which is usually enough for:
- DNS checks
- Service connectivity tests
- HTTP probing from “inside the Pod”
Common production use cases (practical examples)
1) “Service works outside, fails inside”
Check DNS and service resolution:
kubectl debug -n <ns> -it pod/<pod> --image=busybox:1.36
nslookup <service>.<ns>.svc.cluster.local
Then test the actual port:
nc -vz <service>.<ns>.svc.cluster.local 8080
2) “Readiness is failing, but I can’t curl”
If the app image is distroless, you can still do:
kubectl debug -n <ns> -it pod/<pod> --image=curlimages/curl:8.5.0
curl -sS -i http://127.0.0.1:8080/readyz
3) “NetworkPolicy might be blocking egress”
Ephemeral containers are still subject to the Pod’s network policies. That’s good: your debug actions reflect the same constraints your app has.
Validate egress to DNS and external endpoints (where allowed), and compare results across namespaces.
Limitations you should know
Ephemeral containers are intentionally limited:
- They don’t become part of your deployment spec, so you can’t “fix” a Pod by leaving one around.
- They are not meant to expose ports for traffic (treat them as internal tools).
- They are not restarted if they exit.
Also, some environments restrict ephemeral containers for security reasons (policy engines, admission control, managed platforms).
Make debugging safe: RBAC + audit + process
The biggest risk of ephemeral containers is not the feature itself—it’s ungoverned access.
Recommended practices:
- Least privilege RBAC
- allow only a small set of engineers (or a break-glass group)
- scope to specific namespaces if possible
- Audit trail
- log who ran
kubectl debug, when, and why - keep a ticket/reference ID
- Standard operating procedure
- avoid copying secrets out of the cluster
- avoid modifying files in containers; treat debugging as read-only unless there is a clear incident procedure
“I need to debug by cloning the Pod”
Sometimes you want an isolated copy (so you don’t touch prod Pods). kubectl debug can also create a copy (depending on your kubectl version and flags). This approach is useful when:
- you need to install packages
- you want to reproduce in a safe sandbox
- you need to attach tooling that would be too invasive for the live Pod
If your platform supports it, prefer “copy then debug” for high-risk investigations.
Final checklist for production teams
- Have a blessed debug image and a documented workflow
- Restrict ephemeral container usage with RBAC and policy
- Keep debugging read-only and auditable
- Use ephemeral containers to validate reality inside the Pod (DNS, routes, policies), not as a permanent fix
Bonus: Debug the node (when the problem is below Kubernetes)
Sometimes the issue is not “inside the Pod” but on the node:
- CNI problems (routes/iptables/eBPF)
- disk pressure
- kubelet or container runtime issues
- DNS problems from the node’s perspective
Depending on your cluster and kubectl version, you may be able to debug a node by creating a privileged debug pod that mounts the host filesystem.
Conceptually, it looks like this:
kubectl debug node/<node> -it --image=ubuntu:24.04 -- chroot /host
Once inside (and if permitted), you can run:
ip a,ip rss -lntpjournalctl(on some systems)- check
/etc/resolv.confand CNI config
Important: this is powerful and should be restricted even more tightly than pod debugging.
Policy considerations (what to align with security teams)
Ephemeral containers intersect with several security controls:
- Pod Security (restricted/baseline/privileged)
- admission controllers (OPA/Gatekeeper, Kyverno)
- runtime security tools
Before an incident happens, align on:
- which namespaces allow debugging
- which images are allowed
- whether privileged debugging is ever allowed (and under what approval process)
Cleanup and incident hygiene
Ephemeral containers can remain in a Pod’s status until the Pod is deleted. While that’s usually fine, production hygiene matters:
- close the terminal session when finished
- record findings (commands run, outputs, conclusions)
- apply a real fix via GitOps/CI rather than “manual tweaks”
The goal is not to make debugging harder—it’s to make debugging repeatable, auditable, and safe.
References
FAQ
Q: Are ephemeral containers safe in production? A: They are intended for debugging and do not restart automatically. Use RBAC to restrict who can add them.
Q: Why can I not add volumes or ports? A: Ephemeral containers are intentionally limited to avoid mutating the workload. Use them for inspection, not for changes.
Q: How do I view logs?
A: Use kubectl logs <pod> -c <ephemeral-container-name> or kubectl describe pod to confirm status.