Skip to content

Controlplane VMs in vSphere Cluster sometimes land in wrong DC due to Storage Policy #3481

@pslijkhuis

Description

@pslijkhuis

/kind bug

We're using CAPV to deploy a workload cluster across two data centers (DC1 and DC2) within a stretched vSphere cluster.

Control plane nodes are assigned to failure domains correctly via the vspherecluster resource and placed in the appropriate VM groups (DC1 or DC2).

Worker nodes behave as expected because they use separate vspheretemplates with storage policies scoped to their respective DCs.

Control plane nodes, however, share a single vspheretemplate. This template uses a storage policy that targets all datastores across both DCs.

Occasionally, a control plane VM that should be running in DC1 is placed in the DC1 VM group, but the actual storage is provisioned in a datastore located in DC2. As a result, the entire VM ends up running in the wrong DC.

We believe this occurs because vSphere places the VM based on where the datastore is actually provisioned, which is currently not restricted tightly enough due to the shared storage policy in the control plane template.

Expected Behavior:
Control plane VMs should be entirely placed—including compute and storage—in the same data center as their assigned failure domain.

Actual Behavior:
Control plane VMs occasionally land in the wrong physical data center due to storage being provisioned from the opposite DC.

Environment:

  • Cluster-api-provider-vsphere version: 1.13.0
  • Kubernetes version: (use kubectl version): v1.31.5
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.5 LTS

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions