-
Notifications
You must be signed in to change notification settings - Fork 303
🐛 fix: set machine spec failuredomain string if not defined #3576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Welcome @jcpowermac! |
Hi @jcpowermac. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
/lgtm /assign @sbueringer |
LGTM label has been added. Git tree hash: 29283286600b1e3b812f33127e495556fc7d4a05
|
controllers/vspherevm_controller.go
Outdated
@@ -234,10 +234,10 @@ func (r vmReconciler) Reconcile(ctx context.Context, req ctrl.Request) (_ ctrl.R | |||
} | |||
|
|||
var vsphereFailureDomain *infrav1.VSphereFailureDomain | |||
if failureDomain := machine.Spec.FailureDomain; failureDomain != "" { | |||
if failureDomain := vsphereMachine.Spec.FailureDomain; failureDomain != nil && *failureDomain != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right that its always the way that:
Whatever is set on vSphereMachine
's .spec.failureDomain
gets propagated up (by CAPI's machine controller) to the Machine
's .spec.failureDomain
field.
Its just that the timing is bad (CAPI does this too late?!)
Otherwise: should we use a fallback to machine.Spec.FailureDomain
, if its not set on vsphereMachine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the CAPI contract: https://main.cluster-api.sigs.k8s.io/developer/providers/contracts/infra-machine#inframachine-failure-domain
It should be the other way around (which means code as-is should be correct?!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used rr
to troubleshoot, the machine object doesn't receive the failuredomain name before the guest is powered on - then the race is on...will the node ccm init prior to being added to the vm-host group and potentially vmotion'ed to the correct location in the vCenter cluster.
cluster-api-provider-vsphere/pkg/services/govmomi/service.go
Lines 183 to 195 in ade4688
if ok, err := vms.reconcileVMGroupInfo(ctx, virtualMachineCtx); err != nil || !ok { | |
return vm, err | |
} | |
if err := vms.reconcileClusterModuleMembership(ctx, virtualMachineCtx); err != nil { | |
return vm, err | |
} | |
if ok, err := vms.reconcilePowerState(ctx, virtualMachineCtx); err != nil || !ok { | |
return vm, err | |
} | |
if err := vms.reconcileHostInfo(ctx, virtualMachineCtx); err != nil { |
if virtualMachineCtx.VSphereFailureDomain == nil || virtualMachineCtx.VSphereFailureDomain.Spec.Topology.Hosts == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that potentially is the behavior when setting failuredomain only on the InfraMachine object (VSphereMachineTemplate?!).
According to kubernetes-sigs/cluster-api#11232 the correct usage is setting FailureDomain on the Machine object instead. Which means it would also be there in the referenced code.
Setting it on the VSphereMachine / VSphereMachineTemplate is the old deprecated behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that its deprecated behavior, what is the solution then? The machine failuredomain name is slow to become available in the reconcile loop, which causes this race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution if I'm right is: Set the failuredomain on the Machine
(or MachineSet
or MachineDeployment
) object instead of the InfraMachine / VSphereMachine.
It should be fine to still set it on the InfraMachine / VSphereMachine. The important piece is to set it on the Machine instead of relying on CAPI copy this information to the InfraMachine / VSphereMachine.
5e4caa9
to
e4395a0
Compare
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This commit sets the `machine.Spec.FailureDomain` string from the `vsphereMachine.Spec.FailureDomain` if defined and the machine failure domain is empty. This resolves an issue where virtual machines are not placed within vm-host groups in a timely manner causing misplacement in the topology.
e4395a0
to
53134e4
Compare
What this PR does / why we need it:
This commit sets the
machine.Spec.FailureDomain
stringfrom the
vsphereMachine.Spec.FailureDomain
ifdefined and the machine failure domain is empty.
This resolves an issue where virtual machines are not placed
within vm-host groups in a timely
manner causing misplacement in the topology.
Fixes #3575