Draining node is called multiple times (30+) while deleting a machine

I created k3s cluster with CAPZ clusterclass, while deleting a machine, even though node drain is successful, CAPI keeps draining the same machine for 30+ times. Need more investigation if this is specific to `capi k3s`.

Log:
```
#1st time
I0606 09:56:31.424772       1 machine_controller.go:379] "Draining node" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="e220ef5b-d4d8-454d-a787-cc29777a4d31" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
E0606 09:56:33.831664       1 machine_controller.go:669] "WARNING: ignoring DaemonSet-managed Pods: kube-system/cloud-node-manager-877r7, kube-system/etcd-proxy-dq28c, kube-system/svclb-traefik-9d54adee-jmxgh\n" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="e220ef5b-d4d8-454d-a787-cc29777a4d31" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
I0606 09:56:34.146008       1 machine_controller.go:927] "evicting pod kube-system/traefik-f4564c4f4-fhqsl\n" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="e220ef5b-d4d8-454d-a787-cc29777a4d31" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
...
E0606 09:56:54.324252       1 machine_controller.go:686] "Drain failed, retry in 20s" err="error when waiting for pod \"local-path-provisioner-84db5d44d9-lhvh6\" in namespace \"kube-system\" to terminate: global timeout reached: 20s" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="e220ef5b-d4d8-454d-a787-cc29777a4d31" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
....
# 2nd time (repeated for 30+ times under 2mins)
I0606 09:57:05.236640       1 machine_controller.go:379] "Draining node" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="a706ce50-e0f7-443c-bcd7-33a3168a85ab" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
E0606 09:57:08.328683       1 machine_controller.go:669] "WARNING: ignoring DaemonSet-managed Pods: kube-system/cloud-node-manager-877r7, kube-system/etcd-proxy-dq28c, kube-system/svclb-traefik-9d54adee-jmxgh\n" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="a706ce50-e0f7-443c-bcd7-33a3168a85ab" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
I0606 09:57:08.328722       1 machine_controller.go:690] "Drain successful" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="a706ce50-e0f7-443c-bcd7-33a3168a85ab" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
...
# 37th time -> machine got deleted
I0606 09:59:42.458023       1 machine_controller.go:379] "Draining node" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="af7a743d-dec6-402f-a16d-843dc438d654" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
E0606 09:59:42.765445       1 machine_controller.go:643] "Could not find node from noderef, it may have already been deleted" err="nodes \"demok3s-tbbs6-hmxp4\" not found" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="af7a743d-dec6-402f-a16d-843dc438d654" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
E0606 09:59:42.765538       1 machine_controller.go:710] "Could not find node from noderef, it may have already been deleted" err="Node \"demok3s-tbbs6-hmxp4\" not found" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="af7a743d-dec6-402f-a16d-843dc438d654" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
I0606 09:59:42.766135       1 machine_controller.go:468] "Deleting node" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="af7a743d-dec6-402f-a16d-843dc438d654" KThreesControlPlane="default/demok3s-jtx7s" Cluster="default/demok3s" Node="demok3s-tbbs6-hmxp4"
E0606 09:59:43.098869       1 controller.go:329] "Reconciler error" err="failed to patch Machine default/demok3s-jtx7s-tpjl2: machines.cluster.x-k8s.io \"demok3s-jtx7s-tpjl2\" not found" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/demok3s-jtx7s-tpjl2" namespace="default" name="demok3s-jtx7s-tpjl2" reconcileID="af7a743d-dec6-402f-a16d-843dc438d654"
```

Could be solved by adding setting `nodedraintimeout` to something short like 30s. But it is a potential issue if we keep sending lots of unnecessary messages to api server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draining node is called multiple times (30+) while deleting a machine #130

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Draining node is called multiple times (30+) while deleting a machine #130

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions