Skip to content

Calico CNI fails with "context deadline exceeded" when using localhost load balancers #12597

@rickerc

Description

@rickerc

What happened?

I have loadbalancer_apiserver_localhost enabled (which I need for other reasons), but do not have calico_bpf_enabled enabled. With that combination, the kubespray deployment run fails when deploying metallb. Working back through the error stack, the failure ultimately is because an empty kubernetes-services-endpoint ConfigMap causes Calico CNI to use an unreachable service IP. Which in turn means pod deployments get stuck in ContainerCreating.

What did you expect to happen?

kubespray deployment run to succeed

pods to create containers successfully, with functional networking

How can we reproduce it (as minimally and precisely as possible)?

I've narrowed down to the root cause and will submit a PR to fix shortly. But the common factors needed to reproduce are

  • Calico CNI in vxlan mode
  • loadbalancer_apiserver_localhost enabled
  • calico_bpf_enabled set to false

Problem was seen with v2.28.1 (a20891a) but is still present in master HEAD

OS

Ubuntu 24

Version of Ansible

ansible [core 2.16.14]
config file = /home/chricker/kubespray/ansible.cfg
configured module search path = ['/home/chricker/kubespray/library']
ansible python module location = /home/chricker/kubespray/venv_ansible/lib/python3.12/site-packages/ansible
ansible collection location = /home/chricker/.ansible/collections:/usr/share/ansible/collections
executable location = /home/chricker/kubespray/venv_ansible/bin/ansible
python version = 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] (/home/chricker/kubespray/venv_ansible/bin/python3)
jinja version = 3.1.6
libyaml = True

Version of Python

Python 3.12.3

Version of Kubespray (commit)

a20891a

Network plugin used

calico

Full inventory with variables

omitted for policy reasons

Command used to invoke ansible

cd /home/chricker/kubespray && source venv_ansible/bin/activate && ansible-playbook -i /path/to/hosts.ini cluster.yml --become

Output of ansible run

TASK [kubernetes-apps/metallb : Kubernetes Apps | Wait for MetalLB controller to be running] ***
task path: /home/chricker/kubespray/roles/kubernetes-apps/metallb/tasks/main.yml:35
fatal: [test-k8s-dev1-01.test.domain: FAILED! => {"changed": true, "cmd": ["/usr/local/bin/kubectl", "rollout", "status", "-n", "metallb-system", "deployment", "-l", "app=metallb,component=controller", "--timeout=2m"], "delta": "0:00:00.089180", "end": "2025-09-28 04:51:33.042994", "msg": "non-zero return code", "rc": 1, "start": "2025-09-28 04:51:32.953814", "stderr": "error: deployment "controller" exceeded its progress deadline", "stderr_lines": ["error: deployment "controller" exceeded its progress deadline"], "stdout": "", "stdout_lines": []}

Anything else we need to know

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Ubuntu 24kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions