Home About Me

How Kubernetes Deployment Rolling Updates Actually Work in the Code

Getting to the part of Deployment that really matters

Among Kubernetes workload objects, Deployment is usually the first major one people become comfortable with. Most everyday applications are deployed this way, so its surface behavior feels familiar: change the image tag, apply the manifest, and pods begin updating one by one.

What is less obvious is how that update is driven internally. After the image version changes, how does Kubernetes decide when to create new pods, when to remove old ones, and how to keep the whole process under control?

That question leads straight into the Deployment controller.

A useful way to read controller code

For objects like Deployment, the code is not especially hard to locate because the naming is direct. A practical reading path is usually:

  1. Find the main data structure.
  2. Check how it is initialized.
  3. Follow the methods that actually perform reconciliation.

That pattern works especially well here.

Start with the Deployment structure

The definition is exactly where you would expect it to be, and it maps very naturally to what we write in YAML:

<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16</th> <th>// vendor/k8s.io/api/apps/v1/types.go:355 type Deployment struct { metav1.TypeMeta `json:",inline"` // Standard object's metadata. // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata // +optional metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` // Specification of the desired behavior of the Deployment. // +optional Spec DeploymentSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"` // Most recently observed status of the Deployment. // +optional Status DeploymentStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` }</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

At this level, nothing is surprising: metadata, spec, and status. The real question is not what a Deployment looks like, but which component actually controls it.

Which object controls a Deployment?

If you search through references to Deployment, the important thing is not to chase every single usage. It is usually more efficient to narrow the scope by file, and if needed by package, because code in the same package typically shares a responsibility.

Following that path leads to the core object: DeploymentController.

That is the piece responsible for watching changes and reconciling actual state toward desired state.

The structure of DeploymentController

The first pass over its fields already reveals most of the design:

<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34</th> <th>// pkg/controller/deployment/deployment_controller.go:66 type DeploymentController struct { // rsControl is used for adopting/releasing replica sets. rsControl controller.RSControlInterface client clientset.Interface eventBroadcaster record.EventBroadcaster eventRecorder record.EventRecorder // To allow injection of syncDeployment for testing. syncHandler func(ctx context.Context, dKey string) error // used for unit testing enqueueDeployment func(deployment *apps.Deployment) // dLister can list/get deployments from the shared informer's store dLister appslisters.DeploymentLister // rsLister can list/get replica sets from the shared informer's store rsLister appslisters.ReplicaSetLister // podLister can list/get pods from the shared informer's store podLister corelisters.PodLister // dListerSynced returns true if the Deployment store has been synced at least once. // Added as a member to the struct to allow injection for testing. dListerSynced cache.InformerSynced // rsListerSynced returns true if the ReplicaSet store has been synced at least once. // Added as a member to the struct to allow injection for testing. rsListerSynced cache.InformerSynced // podListerSynced returns true if the pod store has been synced at least once. // Added as a member to the struct to allow injection for testing. podListerSynced cache.InformerSynced // Deployments that need to be synced queue workqueue.RateLimitingInterface }</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

Two fields stand out immediately:

  • syncHandler
  • queue

Once those appear together, the overall shape becomes recognizable: events are collected, turned into work items, placed into a queue, and later processed by workers that reconcile state.

To understand how that happens, the next stop is initialization.

Initialization: wiring informers to the controller

The controller is built in NewDeploymentController:

<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29</th> <th>// pkg/controller/deployment/deployment_controller.go:101 func NewDeploymentController(ctx context.Context, dInformer appsinformers.DeploymentInformer, rsInformer appsinformers.ReplicaSetInformer, podInformer coreinformers.PodInformer, client clientset.Interface) (*DeploymentController, error) { //.... dc := &DeploymentController{ //.... queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "deployment"), } //.... dInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{ AddFunc: func(obj interface{}) { dc.addDeployment(logger, obj) }, UpdateFunc: func(oldObj, newObj interface{}) { dc.updateDeployment(logger, oldObj, newObj) }, // This will enter the sync loop and no-op, because the deployment has been deleted from the store. DeleteFunc: func(obj interface{}) { dc.deleteDeployment(logger, obj) }, }) //.... dc.syncHandler = dc.syncDeployment dc.enqueueDeployment = dc.enqueue //.... return dc, nil }</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

Several important things happen here.

First, the controller creates a named rate-limiting work queue.

Second, it registers informer event handlers for Deployment add, update, and delete events. That means the controller is not constantly polling the API server directly; instead, it reacts to changes delivered through informer machinery.

Third, syncHandler is set to dc.syncDeployment. That assignment matters because later the workers will invoke syncHandler, and this is how the controller’s core reconciliation method gets plugged into the work loop.

What happens when a Deployment event arrives?

When a Deployment is added, the processing chain is very short and very clear:

addDeployment -> enqueueDeployment -> enqueue -> dc.queue.Add(key)

The final step looks like this:

<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10</th> <th>// pkg/controller/deployment/deployment_controller.go:391 func (dc *DeploymentController) enqueue(deployment *apps.Deployment) { key, err := controller.KeyFunc(deployment) if err != nil { utilruntime.HandleError(fmt.Errorf("couldn't get key for object %#v: %v", deployment, err)) return } dc.queue.Add(key) }</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

The event itself does not perform the rollout logic. It simply computes the key for the Deployment and pushes that key into the queue.

This is an important design point: the event path stays lightweight, and the real work happens later in a controlled reconciliation loop.

Where the queue is consumed

Queue processing starts in Run, which launches a fixed number of workers:

<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11</th> <th>// pkg/controller/deployment/deployment_controller.go:157 // Run begins watching and syncing. func (dc *DeploymentController) Run(ctx context.Context, workers int) { //... for i := 0; i < workers; i++ { go wait.UntilWithContext(ctx, dc.worker, time.Second) } <-ctx.Done() }</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

Each worker loops continuously:

<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18</th> <th>// pkg/controller/deployment/deployment_controller.go:473 func (dc *DeploymentController) worker(ctx context.Context) { for dc.processNextWorkItem(ctx) { } } func (dc *DeploymentController) processNextWorkItem(ctx context.Context) bool { key, quit := dc.queue.Get() if quit { return false } defer dc.queue.Done(key) err := dc.syncHandler(ctx, key.(string)) dc.handleErr(ctx, err, key) return true }</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

So the execution path is:

Run -> worker -> processNextWorkItem -> syncHandler

And since syncHandler was assigned to syncDeployment, every queued Deployment eventually ends up there.

This is a classic producer-consumer model: informers produce work, workers consume it, and reconciliation drives the actual state changes.

syncDeployment: the controller’s main decision point

The core method is sequential enough that the logic is easy to follow:

<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64</th> <th>// pkg/controller/deployment/deployment_controller.go:581 func (dc *DeploymentController) syncDeployment(ctx context.Context, key string) error { //... deployment, err := dc.dLister.Deployments(namespace).Get(name) if errors.IsNotFound(err) { logger.V(2).Info("Deployment has been deleted", "deployment", klog.KRef(namespace, name)) return nil } if err != nil { return err } // Deep-copy otherwise we are mutating our cache. // TODO: Deep-copy only when needed. d := deployment.DeepCopy() //... // List ReplicaSets owned by this Deployment, while reconciling ControllerRef // through adoption/orphaning. rsList, err := dc.getReplicaSetsForDeployment(ctx, d) if err != nil { return err } // List all Pods owned by this Deployment, grouped by their ReplicaSet. // Current uses of the podMap are: // // * check if a Pod is labeled correctly with the pod-template-hash label. // * check that no old Pods are running in the middle of Recreate Deployments. podMap, err := dc.getPodMapForDeployment(d, rsList) if err != nil { return err } //... if d.Spec.Paused { return dc.sync(ctx, d, rsList) } // rollback is not re-entrant in case the underlying replica sets are updated with a new // revision so we should ensure that we won't proceed to update replica sets until we // make sure that the deployment has cleaned up its rollback spec in subsequent enqueues. if getRollbackTo(d) != nil { return dc.rollback(ctx, d, rsList) } scalingEvent, err := dc.isScalingEvent(ctx, d, rsList) if err != nil { return err } if scalingEvent { return dc.sync(ctx, d, rsList) } switch d.Spec.Strategy.Type { case apps.RecreateDeploymentStrategyType: return dc.rolloutRecreate(ctx, d, rsList, podMap) case apps.RollingUpdateDeploymentStrategyType: return dc.rolloutRolling(ctx, d, rsList) } return fmt.Errorf("unexpected deployment strategy type: %s", d.Spec.Strategy.Type) }</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

The method first reconstructs the state it needs:

  1. Load the Deployment.
  2. Find the ReplicaSets associated with it.
  3. Build a pod map grouped by ReplicaSet.

Once those are in hand, the controller decides what kind of reconciliation is needed.

The possibilities include:

  • paused deployment handling
  • rollback
  • scaling reconciliation
  • rollout according to strategy

For updates, the strategy branch is the key part. A Deployment can choose between:

  • Recreate
  • RollingUpdate

The rolling path is the one most people rely on in practice.

The rolling update logic lives in rolloutRolling

Here is the critical method:

<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38</th> <th>// pkg/controller/deployment/rolling.go:31 // rolloutRolling implements the logic for rolling a new replica set. func (dc *DeploymentController) rolloutRolling(ctx context.Context, d *apps.Deployment, rsList []*apps.ReplicaSet) error { newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, true) if err != nil { return err } allRSs := append(oldRSs, newRS) // Scale up, if we can. scaledUp, err := dc.reconcileNewReplicaSet(ctx, allRSs, newRS, d) if err != nil { return err } if scaledUp { // Update DeploymentStatus return dc.syncRolloutStatus(ctx, allRSs, newRS, d) } // Scale down, if we can. scaledDown, err := dc.reconcileOldReplicaSets(ctx, allRSs, controller.FilterActiveReplicaSets(oldRSs), newRS, d) if err != nil { return err } if scaledDown { // Update DeploymentStatus return dc.syncRolloutStatus(ctx, allRSs, newRS, d) } if deploymentutil.DeploymentComplete(d, &d.Status) { if err := dc.cleanupDeployment(ctx, oldRSs, d); err != nil { return err } } // Sync deployment status return dc.syncRolloutStatus(ctx, allRSs, newRS, d) }</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>

The flow is simpler than it may seem at first glance.

1. Identify the new and old ReplicaSets

The controller figures out which ReplicaSet represents the new desired template, and which ones belong to older revisions.

2. Try to scale up the new ReplicaSet first

reconcileNewReplicaSet attempts to add desired-state capacity.

If scaling up happens, the controller updates Deployment status and returns. It does not continue immediately into the scale-down phase in the same pass.

3. If no scale-up is needed, try to scale down old ReplicaSets

reconcileOldReplicaSets removes capacity from outdated ReplicaSets when allowed.

Again, if scaling down happens, status is updated and the method returns.

4. Clean up when the rollout is complete

Once the Deployment is fully complete, the controller can clean up old ReplicaSets and then sync the final rollout status.

Why the rollout looks smooth from the outside

The interesting part is how these repeated reconciliation passes create the familiar rolling behavior.

Suppose the desired replica count is 3, and the current state is effectively 3 old pods running.

During rollout, the controller may first allow one new pod to appear, temporarily giving a state like 3 desired / 4 current. On the next reconciliation, it may no longer be allowed to keep scaling up, so it shifts to scaling down an old ReplicaSet, bringing the state back toward 3 / 3. Then the process repeats.

So the update progresses by alternating between:

  • adding new-state pods
  • removing old-state pods

until every remaining pod matches the new template.

This is what makes the implementation elegant: the controller does not hardcode a dramatic multi-step script for replacing pods. Instead, it keeps reconciling observed state toward desired state, one allowed adjustment at a time.

That is very close to the essence of a state machine. You declare the target, disturb the old equilibrium by introducing new desired-state capacity, then let reconciliation keep closing the gap.

Answers to the two core questions

Which object controls a Deployment?

DeploymentController

How does a Deployment control updates when the application changes?

The crucial logic sits in rolloutRolling for the rolling update strategy. The controller compares new and old ReplicaSets, tries to scale up the new side first, and then scale down the old side when allowed.

In plain terms, the update loop is:

  1. try scaledUp
  2. then try scaledDown

By repeating that cycle over successive reconciliation passes, Kubernetes gradually moves the system from the old version to the target version.

A few design takeaways from this implementation

Several design ideas stand out from the Deployment controller.

Informer-driven reconciliation

NewDeploymentController shows a very typical and very effective Kubernetes pattern: use informers to watch changes, convert them into queue items, and reconcile asynchronously.

Clear strategy separation

RollingUpdate and Recreate are split into distinct implementations. That makes the code easier to follow and keeps strategy-specific behavior from becoming tangled.

State-based rollout logic

The rolloutRolling path is not complicated because it models rollout as incremental state transitions. That makes the code easier to reason about and, just as importantly, easier to trust.

Once this part of the controller makes sense, many similar Kubernetes controllers start to feel much less mysterious.