-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
I created k3s cluster with CAPZ clusterclass, while deleting a machine, even though node drain is successful, CAPI keeps draining the same machine for 30+ times. Need more investigation if this is specific to capi k3s.
Log:
#1st time
I0606 09:56:31.424772 1 machine_controller.go:379] "Draining node" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="e220ef5b-d4d8-454d-a787-cc29777a4d31" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
E0606 09:56:33.831664 1 machine_controller.go:669] "WARNING: ignoring DaemonSet-managed Pods: kube-system/cloud-node-manager-877r7, kube-system/etcd-proxy-dq28c, kube-system/svclb-traefik-9d54adee-jmxgh\n" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="e220ef5b-d4d8-454d-a787-cc29777a4d31" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
I0606 09:56:34.146008 1 machine_controller.go:927] "evicting pod kube-system/traefik-f4564c4f4-fhqsl\n" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="e220ef5b-d4d8-454d-a787-cc29777a4d31" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
...
E0606 09:56:54.324252 1 machine_controller.go:686] "Drain failed, retry in 20s" err="error when waiting for pod \"local-path-provisioner-84db5d44d9-lhvh6\" in namespace \"kube-system\" to terminate: global timeout reached: 20s" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="e220ef5b-d4d8-454d-a787-cc29777a4d31" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
....
# 2nd time (repeated for 30+ times under 2mins)
I0606 09:57:05.236640 1 machine_controller.go:379] "Draining node" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="a706ce50-e0f7-443c-bcd7-33a3168a85ab" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
E0606 09:57:08.328683 1 machine_controller.go:669] "WARNING: ignoring DaemonSet-managed Pods: kube-system/cloud-node-manager-877r7, kube-system/etcd-proxy-dq28c, kube-system/svclb-traefik-9d54adee-jmxgh\n" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="a706ce50-e0f7-443c-bcd7-33a3168a85ab" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
I0606 09:57:08.328722 1 machine_controller.go:690] "Drain successful" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="a706ce50-e0f7-443c-bcd7-33a3168a85ab" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
...
# 37th time -> machine got deleted
I0606 09:59:42.458023 1 machine_controller.go:379] "Draining node" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="af7a743d-dec6-402f-a16d-843dc438d654" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
E0606 09:59:42.765445 1 machine_controller.go:643] "Could not find node from noderef, it may have already been deleted" err="nodes \"demok3s-tbbs6-hmxp4\" not found" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="af7a743d-dec6-402f-a16d-843dc438d654" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
E0606 09:59:42.765538 1 machine_controller.go:710] "Could not find node from noderef, it may have already been deleted" err="Node \"demok3s-tbbs6-hmxp4\" not found" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="af7a743d-dec6-402f-a16d-843dc438d654" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
I0606 09:59:42.766135 1 machine_controller.go:468] "Deleting node" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="af7a743d-dec6-402f-a16d-843dc438d654" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
E0606 09:59:43.098869 1 controller.go:329] "Reconciler error" err="failed to patch Machine default/demok3s-jtx7s-tpjl2: machines.cluster.x-k8s.io \"demok3s-jtx7s-tpjl2\" not found" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="af7a743d-dec6-402f-a16d-843dc438d654"
Could be solved by adding setting nodedraintimeout to something short like 30s. But it is a potential issue if we keep sending lots of unnecessary messages to api server.
Metadata
Metadata
Assignees
Labels
No labels