Skip to content

MachineSetPreflightChecks + maxSurge zero results in MDs downscaled for a time longer than desirable #12187

@tmmorin

Description

@tmmorin

What would you like to be added (User Story)?

As a user, when I use maxSurge zero, I would like the time where my MachineDeploymenst are downscaled to be as short as possible, during a cluster Kubernetes upgrade -- today they are downscaled at once when the MD is updated and this remains true during the whole duration of the control plane nodes rolling update.

The evolution that would be desirable would be a behavior where, when maxSurge zero is used, the old ReplicaSet for a MachineDeployment is not downscaled until the new ReplicaSet will be able to be upscaled (based on MachineSetPreflightChecks).

Said otherwise:

  • with current 1.9.x code, an MD is not at its target number of replicas for a duration which is: time to update CP nodes + time to update the MD nodes
  • ideally, an MD would not be at its target number of replicas for a duration which would just be the time to update the MD nodes

Detailed Description

Example Scenario

Let's consider the following example scenario:

  • MachineSetPreflightChecks are enabled (they have been enabled by default since 1.9.x)
  • one or more MachineDeployments are used
    • they use RollingUpdate strategy with maxSurge is set to zero
  • CAPI resources for a cluster are updated to trigger a Kubernetes version upgrade (CP and MDs)

Current behavior

What happens today is the following:

  • a CP node rolling update is triggered and starts
  • at once, for all MDs, a new ReplicaSet is created, and the previous one is downscaled (maxSurge 0)
  • at this point all MDs are downscaled, and this will persist until the CP node rolling update is finished (which on baremetal isn't quick, ~1h being a quite typical order of magnitude -- 3 nodes, each rebuild in 20 minutes)

Problem statement

With maxSurge zero, during a Kubernetes upgrade, all MDs will be downscaled by one for a time longer than desirable (during the time needed to roll'update all the CP nodes), while ideally the MDs could remain untouched during the CP nodes rolling update.

The evolution that would be desirable would be a behavior where the old ReplicaSet is not downscaled until the new ReplicaSet will be able to be upscaled.

Relevance

The scenario above is common place for CAPI baremetal deployments (e.g. with capm3) where it is common to not have spare hardware. In baremetal low-footprint scenarios where clusters have a low number of nodes, the difference in terms of available processing resources can be significant.

Note

I opted for filing this as a "feature request", but my feeling is that some might qualify the current behavior as a regression. Please feel free to requalify as a "bug report" if you think this is deserved.

Anything else you would like to add?

No response

Label(s) to be applied

/kind feature
/area machinedeployment
/area upgrades

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/machinedeploymentIssues or PRs related to machinedeploymentsarea/upgradesIssues or PRs related to upgradeskind/featureCategorizes issue or PR as related to a new feature.needs-priorityIndicates an issue lacks a `priority/foo` label and requires one.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions