Cloud Future New Life

CFN Cloud

Cloud-native notes on Kubernetes, platform engineering, and modern infrastructure.

Browse by topic

Featured reads

A good place to start if you're new to the site.

2026-03-12

GPU Overprovisioning Solutions: From Oversubscription and Sharing to Isolation

A practical guide to GPU overprovisioning strategies, including scheduler-level oversubscription, time slicing, memory controls, MIG, vGPU, queue backfill, and operational guardrails.

Read article

2026-03-12

How Startups Should Choose: Serverless GPU vs Dedicated GPU

A practical guide to choosing between serverless GPUs and dedicated GPUs for startups, based on cost structure, delivery speed, performance predictability, operations burden, and team maturity.

2026-03-05

Deep Dive into Linux Heap Memory Management: From Basics to Core Exploitation

A comprehensive deep dive into Linux glibc (ptmalloc2) heap memory allocation and reclamation strategies. Explores Arenas, Chunks, Bins (Fast, Small, Large, Unsorted) data structures, and the principles of classic vulnerabilities such as Use-After-Free.

2026-02-26

OpenClaw vs ZeroClaw vs PicoClaw: Comparing 5 AI Agent Frameworks

A practical comparison of five AI agent frameworks - OpenClaw, ZeroClaw, PicoClaw, Nanobot, and IronClaw - covering size, architecture, security tradeoffs, and adoption fit.

2026-01-26

KAI-Scheduler vs HAMi: Two Ways to Share GPUs in Kubernetes (Soft vs Hard Isolation)

An engineering-oriented comparison of KAI-Scheduler’s Reservation Pod approach and HAMi’s hard isolation path, including trade-offs, failure modes (noisy neighbor), and how the two layers can complement each other.

2026-01-20

hetGPU: Chasing Cross-Vendor GPU Binary Compatibility

An engineering-oriented guide to hetGPU: how a compiler + runtime stack can make one GPU binary run across NVIDIA/AMD/Intel/Tenstorrent, including SIMT vs MIMD, memory model gaps, and live kernel migration.

Read by topic

Topic guides

Start with the basics, then continue to operations and troubleshooting topics.

Kubernetes

A curated reading track for Kubernetes.

Kubernetes vs Docker vs OpenStack: Stop Comparing Tools at Different Layers Kubernetes Troubleshooting Playbook: Pending, CrashLoopBackOff, and Traffic Failures Kubernetes Probe Best Practices: Liveness, Readiness, Startup, and Failure Signals Kubernetes Tip: Autoscaling Without Thrash (HPA + VPA + Cluster Autoscaler)

GPU

A curated reading track for GPU.

GPU Overprovisioning Solutions: From Oversubscription and Sharing to Isolation How Startups Should Choose: Serverless GPU vs Dedicated GPU KAI-Scheduler vs HAMi: Two Ways to Share GPUs in Kubernetes (Soft vs Hard Isolation) hetGPU: Chasing Cross-Vendor GPU Binary Compatibility

System

A curated reading track for System.

Deep Dive into Linux Heap Memory Management: From Basics to Core Exploitation Linux CGroup Deep Dive: Migrating from V1 Chaos to V2 Architecture Linux Function Calls and Stack Frames ELF Explained: Sections, Segments, Relocations, and Dynamic Linking

Recent writing

New notes, guides, and long-form pieces from the main archive.

2026-03-12 · 153 views

GPU Overprovisioning Solutions: From Oversubscription and Sharing to Isolation

A practical guide to GPU overprovisioning strategies, including scheduler-level oversubscription, time slicing, memory controls, MIG, vGPU, queue backfill, and operational guardrails.

Read this piece →

2026-03-12 · 140 views

How Startups Should Choose: Serverless GPU vs Dedicated GPU

A practical guide to choosing between serverless GPUs and dedicated GPUs for startups, based on cost structure, delivery speed, performance predictability, operations burden, and team maturity.

Read →

2026-03-05 · 163 views

Deep Dive into Linux Heap Memory Management: From Basics to Core Exploitation

Read →

2026-02-26 · 299 views

OpenClaw vs ZeroClaw vs PicoClaw: Comparing 5 AI Agent Frameworks

A practical comparison of five AI agent frameworks - OpenClaw, ZeroClaw, PicoClaw, Nanobot, and IronClaw - covering size, architecture, security tradeoffs, and adoption fit.

Read →

2026-01-26 · 341 views

KAI-Scheduler vs HAMi: Two Ways to Share GPUs in Kubernetes (Soft vs Hard Isolation)

Read →

2026-01-20 · 240 views

hetGPU: Chasing Cross-Vendor GPU Binary Compatibility

Read →

2026-01-20 · 213 views

Kubernetes vs Docker vs OpenStack: Stop Comparing Tools at Different Layers

A practical boundary guide: Docker packages and runs containers, Kubernetes orchestrates and keeps services stable at scale, and OpenStack turns datacenter hardware into an IaaS resource pool (VM/network/storage).

Read →

2026-01-12 · 357 views

Kubernetes GPU Virtualization Explained Through gpu-manager Startup Flow

A deep dive into Kubernetes GPU virtualization through gpu-manager startup flow, including device interception, topology awareness, scheduling, and allocation mechanics.

Read →

2026-01-12 · 464 views

Linux CGroup Deep Dive: Migrating from V1 Chaos to V2 Architecture

A practical guide to Linux cgroups, covering core concepts, controller behavior, and troubleshooting in production environments.

Read →

2026-01-09 · 231 views

Linux Function Calls and Stack Frames

Understand calling conventions, stack frames, call/ret behavior, debugging observation, and security implications from the assembly view.

Read →

2026-01-09 · 319 views

ELF Explained: Sections, Segments, Relocations, and Dynamic Linking

Understand ELF files from sections and segments to relocations and dynamic linking, with practical examples for debugging Linux binaries and loader issues.

Read →

2025-12-29 · 277 views

Kubernetes Troubleshooting Playbook: Pending, CrashLoopBackOff, and Traffic Failures

A practical Kubernetes troubleshooting playbook for Pending Pods, CrashLoopBackOff, readiness failures, networking issues, and node-level problems.

Read →

2025-12-29 · 556 views

Kubernetes Probe Best Practices: Liveness, Readiness, Startup, and Failure Signals

Use better Kubernetes probes by choosing the right signal, tuning thresholds, and avoiding false restarts, traffic drops, and noisy rollouts.

Read →

2025-12-29 · 245 views

Kubernetes Tip: Autoscaling Without Thrash (HPA + VPA + Cluster Autoscaler)

How to make autoscaling predictable: right requests, sane HPA behavior, VPA recommendations, and capacity-aware cluster scaling.

Read →

2025-12-29 · 666 views

Kubernetes NetworkPolicy Best Practices: Default Deny, DNS, and Safe Rollout

A practical rollout path for Kubernetes NetworkPolicy: start with default deny, whitelist DNS and key dependencies, and avoid breaking production traffic.

Read →

2025-12-29 · 495 views

Kubernetes RBAC Least Privilege: Safer Roles, Bindings, and Access Review

Learn practical Kubernetes RBAC least-privilege patterns, how to reduce overbroad permissions, and which checks catch risky role bindings before incidents.

Read →

2025-12-29 · 662 views

Kubernetes Tip: Debug Pods with Ephemeral Containers

Safely inspect a live Pod without baking debugging tools into production images.

Read →

2025-12-29 · 462 views

Kubernetes Tip: Safer Rollouts with PDB + Surge/Unavailable

Combine Deployment rollingUpdate settings with PodDisruptionBudgets to keep availability during upgrades and node maintenance.

Read →

2025-12-29 · 406 views

Kubernetes Tip: Requests & Limits (Without Surprises)

How CPU/memory requests and limits actually affect scheduling, throttling, OOMKills, and autoscaling.

Read →

2025-10-15 · 352 views

Kubernetes Probes Explained: Liveness, Readiness, and Startup Checks

Learn how liveness, readiness, and startup probes work in Kubernetes, what each one should check, and how to avoid restart loops and false failures.

Read →

2025-10-14 · 285 views

Helm + MySQL on Kubernetes: Install a Cluster and Understand the Tradeoffs

Use Helm to deploy a MySQL cluster on Kubernetes while understanding chart defaults, persistence, networking, and production tradeoffs.

Read →

2025-10-13 · 288 views

kubectl Port-Forward Explained: Safe Debugging Access to Kubernetes Workloads

Learn how kubectl port-forward works, when to use it for debugging, and how it differs from Services, Ingress, and production traffic paths.

Read →

2025-10-12 · 198 views

MySQL Replication on Kubernetes: Topology, Storage, and Failure Modes

Understand how to run MySQL replication on Kubernetes, including primary-replica design, storage concerns, failover risks, and operational checks.

Read →

2025-10-11 · 376 views

Kubernetes Headless Service Explained: DNS, Pod Identity, and Stateful Workloads

Understand when to use a headless Service in Kubernetes, how DNS works without a virtual IP, and why it matters for StatefulSets and peer discovery.

Read →

2025-10-10 · 369 views

Kubernetes StorageClass Explained: Dynamic Provisioning and Defaults

Understand how StorageClass enables dynamic provisioning in Kubernetes, how default classes work, and how to choose the right storage policy.

Read →

2025-10-10 · 467 views

Kubernetes StatefulSet Explained: Stable Identity, Ordering, and Storage

Learn when to use a StatefulSet in Kubernetes, how stable Pod identity works, and why ordering and persistent storage matter.

Read →

2025-10-09 · 264 views

Kubernetes PV and PVC Explained: Persistent Storage Basics

Learn how PersistentVolumes and PersistentVolumeClaims work in Kubernetes, how binding happens, and how to troubleshoot storage lifecycle issues.

Read →

2025-10-09 · 314 views

Ephemeral Volumes

Ephemeral volumes live with the Pod and fit cache or temp files.

Read →

2025-10-08 · 241 views

Kubernetes ConfigMap vs Secret: Configuration, Sensitive Data, and Safe Usage

Understand when to use ConfigMap or Secret in Kubernetes, how they reach Pods, and which practices reduce config drift and secret exposure.

Read →

2025-10-08 · 327 views

Kubernetes Volumes Explained: EmptyDir, HostPath, and Persistent Storage Basics

Learn the core Kubernetes volume types, what data survives Pod restarts, and how to choose between temporary and persistent storage.

Read →

1 / 2 Next →

CFN Cloud

Popular topics

Featured reads

Topic guides

Recent writing