-
Notifications
You must be signed in to change notification settings - Fork 305
Description
Environmental Info:
RKE2 Version:
rke2 version v1.32.8+rke2r1 (34960df)
go version go1.23.11 X:boringcrypto
Node(s) CPU architecture, OS, and Version:
Linux az1-cp-chmgm-9h66r 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
3 Control planes (role: control-plane and etcd)
3 Worker nodes
Describe the bug:
Previously we created RKE2 clusters (via Rancher) using version 1.31.7. We recently started to create new clusters using 1.32.8. We use Cilium as the CNI. I've noticed that new clusters with 1.32.8 aren't starting as the Cilium Agents is not starting. This is caused by the agent waiting for the CRDs te be applied by the cilium operator. But since 1.32.8 the cilium operator is not starting due to untolerated taints on node-role.kubernetes.io/etcd and node.cloudprovider.kubernetes.io/uninitialized.
It looks like the Cilium Helm Release is updated in 1.18.000 to use more specific tolerations

Steps To Reproduce:
Install RKE2 via Rancher using control planes with the etcd and control-plane role. Afterwards cluster hangs in bootstrapping, as the cilium agent is not starting.
Expected behavior:
I would expect that the cilium-operator tolerates the taint node-role.kubernetes.io/etcd:NoExecute
Actual behavior:
The operator doesn't tolerate the node-role.kubernetes.io/etcd and thus the operator won't start and can't install CRDs preventing the cilium agent to start.
Workaround:
Add manually toleration to the cilium-operator for node-role.kubernetes.io/etcd and node.cloudprovider.kubernetes.io/uninitialized starts the operator, installs the CRDs and the agent will start afterwards.