-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
What would you like to be added (User Story)?
As a user, when I use maxSurge zero, I would like the time where my MachineDeploymenst are downscaled to be as short as possible, during a cluster Kubernetes upgrade -- today they are downscaled at once when the MD is updated and this remains true during the whole duration of the control plane nodes rolling update.
The evolution that would be desirable would be a behavior where, when maxSurge zero is used, the old ReplicaSet for a MachineDeployment is not downscaled until the new ReplicaSet will be able to be upscaled (based on MachineSetPreflightChecks).
Said otherwise:
- with current 1.9.x code, an MD is not at its target number of replicas for a duration which is: time to update CP nodes + time to update the MD nodes
- ideally, an MD would not be at its target number of replicas for a duration which would just be the time to update the MD nodes
Detailed Description
Example Scenario
Let's consider the following example scenario:
- MachineSetPreflightChecks are enabled (they have been enabled by default since 1.9.x)
- one or more MachineDeployments are used
- they use RollingUpdate strategy with maxSurge is set to zero
- CAPI resources for a cluster are updated to trigger a Kubernetes version upgrade (CP and MDs)
Current behavior
What happens today is the following:
- a CP node rolling update is triggered and starts
- at once, for all MDs, a new ReplicaSet is created, and the previous one is downscaled (maxSurge 0)
- at this point all MDs are downscaled, and this will persist until the CP node rolling update is finished (which on baremetal isn't quick, ~1h being a quite typical order of magnitude -- 3 nodes, each rebuild in 20 minutes)
Problem statement
With maxSurge zero, during a Kubernetes upgrade, all MDs will be downscaled by one for a time longer than desirable (during the time needed to roll'update all the CP nodes), while ideally the MDs could remain untouched during the CP nodes rolling update.
The evolution that would be desirable would be a behavior where the old ReplicaSet is not downscaled until the new ReplicaSet will be able to be upscaled.
Relevance
The scenario above is common place for CAPI baremetal deployments (e.g. with capm3) where it is common to not have spare hardware. In baremetal low-footprint scenarios where clusters have a low number of nodes, the difference in terms of available processing resources can be significant.
Note
I opted for filing this as a "feature request", but my feeling is that some might qualify the current behavior as a regression. Please feel free to requalify as a "bug report" if you think this is deserved.
Anything else you would like to add?
No response
Label(s) to be applied
/kind feature
/area machinedeployment
/area upgrades