Kubernetes memory limits create a binary cliff between 'working fine' and 'OOM-killed with no warning,' and swap support remains experimental
hardwarehardware0 views
Kubernetes enforces hard memory limits on containers via Linux cgroups: a pod using 1023MB of its 1024MB limit appears healthy, but at 1025MB the kernel's OOM killer terminates it instantly with no graceful shutdown, no heap dump, and no diagnostic data. So what? Stateful services (databases, message queues, ML model servers) lose in-flight transactions, corrupt write-ahead logs, or drop cached model state, requiring expensive recovery procedures that take 5-30 minutes. So what? Platform teams respond by setting memory limits 2-4x higher than typical usage to avoid OOM kills, which wastes 40-60% of cluster RAM capacity across the fleet. So what? At $0.05-$0.10 per GB-hour on major cloud providers, a 500-node cluster wasting 50% memory capacity costs $500K-$2M annually in unused but reserved RAM. So what? The alternative, enabling swap to provide a soft landing, has been disabled by default in Kubernetes since its inception and only reached beta (NodeSwap feature gate) in Kubernetes v1.33, meaning most production clusters still cannot use it. So what? DevOps teams are trapped between wasting money (over-provisioning) and risking outages (tight limits), with no middle ground available in the dominant container orchestration platform. This persists because Kubernetes was designed with the assumption that swap degrades performance predictability, the NodeSwap feature has taken 4+ years to stabilize, and the OOM killer's binary behavior is a Linux kernel design decision that cgroups v2 has not fundamentally changed.
Evidence
Kubernetes official documentation states the kubelet will OOMKill containers exceeding memory limits even if unused swap space exists. The Kubernetes blog (Aug 2025) published 'Tuning Linux Swap for Kubernetes: A Deep Dive' ahead of v1.34 graduation. Mihai Albert's detailed analysis 'Out-of-memory (OOM) in Kubernetes Part 2' documents the OOM killer's behavior and diagnostic limitations. Medium post 'Don't Let Swap Crash Your Cluster' warns about swap-related instability.