Skip to content

CAPV + InClusterIPAM + Talos: IPs from addressesFromPools are not applied inside Talos (vSphere guestinfo.metadata vs Talos machine config) #3647

@AkakievKD

Description

@AkakievKD

/kind bug
Title: Static IPs from addressesFromPools are not persisted on Talos VMs (CAPV passes IP via cloud-init/guestinfo, which Talos does not consume)

What steps did you take and what happened:
I deploy a Talos-based management/ workload cluster on vSphere via Cluster API + CAPV and allocate static IPs via CAPI IPAM (InClusterIPPool) referenced from VSphereMachineTemplate.spec.template.spec.network.devices[].addressesFromPools.

# IPAM pool
apiVersion: ipam.cluster.x-k8s.io/v1alpha2
kind: InClusterIPPool
metadata:
  name: pool
spec:
  addresses:
    - 192.168.163.150-192.168.163.160
  prefix: 24
  gateway: 192.168.163.2

---
# Infra: vSphere cluster + endpoint
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
metadata:
  name: cluster
spec:
  server: "192.168.163.158"
  controlPlaneEndpoint:
    host: 192.168.163.162
    port: 6443
  identityRef:
    kind: Secret
    name: vsphere-cluster

---
# CAPV machine template with addressesFromPools
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
  name: master
spec:
  template:
    spec:
      cloneMode: fullClone
      datacenter: "Datacenter"
      datastore: "datastore1"
      folder: "testvm"
      resourcePool: "default"
      server: "192.168.163.158"
      template: "talos-cloud"
      network:
        devices:
          - networkName: "VM Network"
            # The key part: static addresses from IPAM
            addressesFromPools:
              - apiGroup: ipam.cluster.x-k8s.io
                kind: InClusterIPPool
                name: pool

---
# Talos control plane (Talos bootstrap provider)
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: TalosControlPlane
metadata:
  name: master
spec:
  version: v1.34.0
  replicas: 3
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: VSphereMachineTemplate
    name: master
  controlPlaneConfig:
    controlplane:
      generateType: controlplane
      talosVersion: v1.11.1
      # Example network patch (two scenarios described below)
      strategicPatches:
        - |
          - op: replace
            path: /machine/network/interfaces
            value:
              - interface: eth0
                dhcp: false      # Scenario A (static in OS)
                vip:
                  ip: 192.168.163.162

Scenario A (Talos dhcp: false, relying on IPAM static):

On first boot, the VM may come up and the control plane can start, but after the first reboot the interface on Talos has no IP assigned anymore; the node becomes unreachable and the VIP cannot bind. Controller logs include messages like “connect: no route to host” and “no addresses were found for node”.

The IP remains allocated in CAPI/IPAM/CAPV objects, but it is not present inside the guest OS after reboot.

Scenario B (Talos dhcp: true + IPAM static):

On first boot, sometimes both a DHCP lease and the IPAM-provided address appear; sometimes only the IPAM address appears.

After a reboot, the Talos node keeps only the DHCP address; the IPAM-provided static IP disappears from the interface.

Talos with dhcp:false → after reboot the configured IP is gone on the interface.
Talos with dhcp:true → after reboot only the DHCP address remains; the IPAM static IP disappears.

What did you expect to happen:
When addressesFromPools is set, the IP address allocated by CAPI IPAM is consistently configured inside the guest OS and survives reboots for CAPV + Talos (i.e., static address reliably applied), or

If CAPV relies on cloud-init via guestinfo to apply network configuration (and the guest OS does not consume cloud-init), CAPV should clearly surface a warning (event/condition) and document that addressesFromPools will not configure networking in the guest OS for Talos, and that DHCP reservations or explicit static configuration in the Talos machine config must be used instead.

Anything else you would like to add:
From the CAPV code and docs, CAPV writes cloud-init user-data/metadata via vSphere guestinfo (guestinfo.userdata / guestinfo.metadata). That requires the guest OS to consume cloud-init to actually configure the network.

Talos Linux configures networking via its own machine configuration (e.g., machine.network.interfaces), and by default runs DHCP on interfaces with link; it does not consume cloud-init metadata for networking. So the IP set by CAPV/IPAM in guestinfo is not applied/persisted by Talos across reboots.

Related upstream discussion notes that CAPV currently places the IP in guestinfo.metadata, which Talos does not read, hence no static IP applied inside the VM

Environment:

  • Cluster-api-provider-vsphere version: v1.14.0
  • Kubernetes version: (use kubectl version): local meneged cluster 1.31.2
  • OS (e.g. from /etc/os-release): debian 11.6 for local meneged cluster 1.31.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions