Kubernetes v1.36: Tiered Memory Protection with Memory QoS
Memory management has always been one of the trickier aspects of running containers at scale. You set resource requests and limits, hope nothing goes wrong, and then debug mysterious OOM (out-of-memory) kills at 3 AM. Kubernetes v1.36 is making this situation materially better with significant updates to Memory QoS, a feature that’s been evolving since v1.22. The new tiered memory protection system gives the Linux kernel much more nuanced guidance about which containers deserve memory when resources get tight, moving beyond the blunt instrument of hard limits.
Here’s what’s actually happening under the hood. Memory QoS leverages cgroup v2’s memory controller to set protection levels on container memory based on their Quality of Service (QoS) class. In practical terms, this means Guaranteed pods (those with matching requests and limits) get the highest protection, Burstable pods sit in the middle, and Best Effort pods get protection only if spare memory exists. When the system faces memory pressure, the kernel uses these hints to reclaim memory proportionally rather than just killing pods arbitrarily. Kubernetes v1.36 adds opt-in memory reservation—allowing you to reserve memory for the kubelet and system daemons—plus better observability through metrics that show memory protection levels and actual usage patterns. You’re essentially giving the kernel a decision-making framework instead of leaving it to guess.
Why does this matter? Consider a cluster running both batch processing jobs and latency-sensitive services. Without tiered protection, a memory spike in your batch jobs could trigger an OOM kill in your critical API service—the kernel has no way to know the difference in importance. With Memory QoS properly configured, your Guaranteed pods (which should be your critical services) maintain their memory guarantees even when Best Effort jobs spike. For teams managing multi-tenant clusters or running heterogeneous workloads, this reduces the operational firefighting and makes resource allocation predictable. The new observability features also mean you can finally see why a pod was evicted instead of just noticing it disappeared.
Getting started requires enabling the MemoryQoS feature gate (it’s alpha, so expect some iteration) and understanding your workload’s QoS classes. Start by auditing which pods should be Guaranteed versus Burstable, then monitor the new memory protection metrics to validate the kernel is actually honoring your tiers. This won’t magically fix over-provisioned clusters, but it will make the memory you do have go further and more predictably.