CFN Cloud
Cloud Future New Life
en zh
2025-12-29 · 0 views

Kubernetes Tip: Debug Pods with Ephemeral Containers

Safely inspect a live Pod without baking debugging tools into production images.

Ephemeral containers let you attach a temporary debugging container to an existing Pod.

This is useful when:

  • your app image is minimal (distroless/scratch)
  • you need curl, dig, tcpdump, strace, etc.
  • you want to keep production images clean

Requirements

  • Kubernetes version must support ephemeral containers (GA in newer versions)
  • You need RBAC permission to use pods/ephemeralcontainers

Basic workflow

Attach a debug container:

kubectl debug -n <ns> -it pod/<pod> --image=busybox:1.36 --target=<container-name>

If you don’t need a specific target process namespace, you can omit --target.

What you can do

DNS / connectivity checks

nslookup kubernetes.default.svc.cluster.local
wget -S -O- http://<service>.<ns>.svc.cluster.local:8080/readyz

Inspect env and files (read-only best practice)

  • confirm env variables
  • validate mounted ConfigMaps/Secrets
  • check /etc/resolv.conf and /etc/hosts

Things to remember

  • Ephemeral containers do not restart and are not part of the original Pod spec.
  • They are meant for debugging, not as a permanent fix.
  • In production, log every debug operation (who/when/why).

Suggested RBAC (least privilege idea)

Definition list:

Ephemeral containers
Require update on the subresource pods/ephemeralcontainers.
Pods
Read-only access is usually enough for inspection.

Checklist

  • Prefer ephemeral containers over “debug tools in prod image”
  • Restrict RBAC to pods/ephemeralcontainers
  • Audit debugging actions

Choosing the right debug image

Different debugging tasks need different tools. A few common patterns:

  • BusyBox: tiny, good for basic networking (nslookup, wget, nc).
  • Alpine: a bit more flexible, can add packages if needed (but beware of network egress restrictions).
  • Netshoot-style images: loaded with curl, dig, tcpdump, mtr, etc. Great for networking, but heavier and potentially riskier.

In production, consider maintaining a blessed debug image:

  • pinned by digest (immutable)
  • regularly scanned
  • minimal but sufficient tools

That gives you consistent, auditable behavior.

Understanding --target (process namespace sharing)

When you specify:

kubectl debug -n <ns> -it pod/<pod> --image=<img> --target=<container>

Kubernetes will try to attach the ephemeral container to the target container’s namespaces (especially the process namespace). This helps when you want to:

  • inspect processes
  • run tools like strace (where permitted)
  • understand what the application is doing in real time

If you omit --target, you still share the Pod network namespace, which is usually enough for:

  • DNS checks
  • Service connectivity tests
  • HTTP probing from “inside the Pod”

Common production use cases (practical examples)

1) “Service works outside, fails inside”

Check DNS and service resolution:

kubectl debug -n <ns> -it pod/<pod> --image=busybox:1.36
nslookup <service>.<ns>.svc.cluster.local

Then test the actual port:

nc -vz <service>.<ns>.svc.cluster.local 8080

2) “Readiness is failing, but I can’t curl”

If the app image is distroless, you can still do:

kubectl debug -n <ns> -it pod/<pod> --image=curlimages/curl:8.5.0
curl -sS -i http://127.0.0.1:8080/readyz

3) “NetworkPolicy might be blocking egress”

Ephemeral containers are still subject to the Pod’s network policies. That’s good: your debug actions reflect the same constraints your app has.

Validate egress to DNS and external endpoints (where allowed), and compare results across namespaces.

Limitations you should know

Ephemeral containers are intentionally limited:

  • They don’t become part of your deployment spec, so you can’t “fix” a Pod by leaving one around.
  • They are not meant to expose ports for traffic (treat them as internal tools).
  • They are not restarted if they exit.

Also, some environments restrict ephemeral containers for security reasons (policy engines, admission control, managed platforms).

Make debugging safe: RBAC + audit + process

The biggest risk of ephemeral containers is not the feature itself—it’s ungoverned access.

Recommended practices:

  1. Least privilege RBAC
  • allow only a small set of engineers (or a break-glass group)
  • scope to specific namespaces if possible
  1. Audit trail
  • log who ran kubectl debug, when, and why
  • keep a ticket/reference ID
  1. Standard operating procedure
  • avoid copying secrets out of the cluster
  • avoid modifying files in containers; treat debugging as read-only unless there is a clear incident procedure

“I need to debug by cloning the Pod”

Sometimes you want an isolated copy (so you don’t touch prod Pods). kubectl debug can also create a copy (depending on your kubectl version and flags). This approach is useful when:

  • you need to install packages
  • you want to reproduce in a safe sandbox
  • you need to attach tooling that would be too invasive for the live Pod

If your platform supports it, prefer “copy then debug” for high-risk investigations.

Final checklist for production teams

  • Have a blessed debug image and a documented workflow
  • Restrict ephemeral container usage with RBAC and policy
  • Keep debugging read-only and auditable
  • Use ephemeral containers to validate reality inside the Pod (DNS, routes, policies), not as a permanent fix

Bonus: Debug the node (when the problem is below Kubernetes)

Sometimes the issue is not “inside the Pod” but on the node:

  • CNI problems (routes/iptables/eBPF)
  • disk pressure
  • kubelet or container runtime issues
  • DNS problems from the node’s perspective

Depending on your cluster and kubectl version, you may be able to debug a node by creating a privileged debug pod that mounts the host filesystem.

Conceptually, it looks like this:

kubectl debug node/<node> -it --image=ubuntu:24.04 -- chroot /host

Once inside (and if permitted), you can run:

  • ip a, ip r
  • ss -lntp
  • journalctl (on some systems)
  • check /etc/resolv.conf and CNI config

Important: this is powerful and should be restricted even more tightly than pod debugging.

Policy considerations (what to align with security teams)

Ephemeral containers intersect with several security controls:

  • Pod Security (restricted/baseline/privileged)
  • admission controllers (OPA/Gatekeeper, Kyverno)
  • runtime security tools

Before an incident happens, align on:

  • which namespaces allow debugging
  • which images are allowed
  • whether privileged debugging is ever allowed (and under what approval process)

Cleanup and incident hygiene

Ephemeral containers can remain in a Pod’s status until the Pod is deleted. While that’s usually fine, production hygiene matters:

  • close the terminal session when finished
  • record findings (commands run, outputs, conclusions)
  • apply a real fix via GitOps/CI rather than “manual tweaks”

The goal is not to make debugging harder—it’s to make debugging repeatable, auditable, and safe.

References

FAQ

Q: Are ephemeral containers safe in production? A: They are intended for debugging and do not restart automatically. Use RBAC to restrict who can add them.

Q: Why can I not add volumes or ports? A: Ephemeral containers are intentionally limited to avoid mutating the workload. Use them for inspection, not for changes.

Q: How do I view logs? A: Use kubectl logs <pod> -c <ephemeral-container-name> or kubectl describe pod to confirm status.