Kubernetes v1.36: Server-Side Sharded List and Watch
Kubernetes clusters are getting bigger, and bigger clusters create bigger problems. When you’re running tens of thousands of nodes, controllers that need to watch resources like Pods start hitting a scaling wall. Every instance of a horizontally scaled controller receives the complete stream of events from the API server—and that’s expensive. Each replica deserves to deserialize every single object, even though most of them don’t belong to that replica’s slice of responsibility. This redundancy multiplies CPU, memory, and network costs across your entire control plane. Kubernetes v1.36 introduces server-side sharded list and watch to fix this inefficiency.
Here’s how it works: instead of each controller replica receiving all events for all objects, the API server now intelligently partitions the data stream. When you enable sharding, controllers can specify which “shard” they’re interested in, and the API server filters events before sending them. Think of it like a postal service that used to deliver every letter to every mailroom, then asked each room to discard what wasn’t theirs. Now the postal service sorts mail by address before delivery. Technically, this works through a new shard parameter in the watch request. The API server divides the total keyspace (all Pod objects, for example) into N equal parts and routes only the relevant partition to each watching controller. This shift from client-side filtering to server-side filtering means fewer bytes on the network, less CPU spent deserializing unwanted objects, and significantly lower memory footprint across controller replicas.
The practical impact matters most in three scenarios. First, large-scale multi-tenant clusters benefit immediately—if you have custom controllers managing tenant-specific resources, sharding lets each controller instance focus only on its assigned tenants. Second, managed Kubernetes services (like EKS, GKE, or AKS) can run leaner control planes, reducing operational costs and improving responsiveness. Third, any organization running custom controllers at scale—think machine learning platforms, CI/CD systems, or infrastructure management tools—suddenly doesn’t need to over-provision controller replicas to handle the deserialization overhead. A team running a custom controller that watches 50,000 Pods across 500 replicas can now run fewer replicas while processing updates faster, because each replica only handles its assigned Pod subset.
Adopting this feature requires minimal code changes if you’re using standard Kubernetes controller libraries (client-go, kubebuilder, and operator-sdk are adding support). For teams building custom watchers directly against the API, you’ll add the shard parameter to your list and watch requests and implement basic shard assignment logic. The trade-off is small—you gain substantial efficiency at the cost of slightly more complex controller initialization. If your cluster is still growing or your controllers are resource-hungry, v1.36’s sharding feature deserves evaluation in your next upgrade cycle.