-
Notifications
You must be signed in to change notification settings - Fork 303
Description
/kind bug
We're using CAPV to deploy a workload cluster across two data centers (DC1 and DC2) within a stretched vSphere cluster.
Control plane nodes are assigned to failure domains correctly via the vspherecluster resource and placed in the appropriate VM groups (DC1 or DC2).
Worker nodes behave as expected because they use separate vspheretemplates with storage policies scoped to their respective DCs.
Control plane nodes, however, share a single vspheretemplate. This template uses a storage policy that targets all datastores across both DCs.
Occasionally, a control plane VM that should be running in DC1 is placed in the DC1 VM group, but the actual storage is provisioned in a datastore located in DC2. As a result, the entire VM ends up running in the wrong DC.
We believe this occurs because vSphere places the VM based on where the datastore is actually provisioned, which is currently not restricted tightly enough due to the shared storage policy in the control plane template.
Expected Behavior:
Control plane VMs should be entirely placed—including compute and storage—in the same data center as their assigned failure domain.
Actual Behavior:
Control plane VMs occasionally land in the wrong physical data center due to storage being provisioned from the opposite DC.
Environment:
- Cluster-api-provider-vsphere version: 1.13.0
- Kubernetes version: (use
kubectl version): v1.31.5 - OS (e.g. from
/etc/os-release): Ubuntu 22.04.5 LTS