Skip to content

Commit ebba609

Browse files
authored
Update troubleshooting guide to include steps to debug CCM (#2347)
1 parent 1f9046f commit ebba609

File tree

1 file changed

+136
-0
lines changed

1 file changed

+136
-0
lines changed

docs/book/src/user/troubleshooting.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,3 +56,139 @@
5656
### 3. Failed to apply a cluster template with release not found error
5757

5858
While trying to apply a cluster template from unreleased version like from main branch, we will run into error like `release not found for version vX.XX.XX`. In that case, instead of `--flavor` we need to use `--from=<path_to_cluster_template>`.
59+
60+
61+
### 4. Debugging Machine struck in PROVISIONED phase
62+
63+
* A Machine's Running phase indicates that it has successfully created, initialised and has become a Kubernetes Node in a Ready state.
64+
65+
* Sometimes a machine will be in Provisioned phase forever indicating infrastructure has been created and configured but yet to become a Kubernetes node.
66+
67+
* Cloud controller manager(CCM) takes care of turning a machine into a node by fetching and initialising with appropriate data from cloud.
68+
69+
* As a part of cluster create template we make use of [ClusterResourceSet](https://cluster-api.sigs.k8s.io/tasks/cluster-resource-set) to apply the CCM [resources](https://github.com/kubernetes-sigs/cluster-api-provider-ibmcloud/blob/cbdb2550ab3e326c95d075a6dc852c81c15b1189/templates/cluster-template-powervs.yaml#L300-L315) into the workload cluster.
70+
71+
* Check the machine's current status
72+
73+
```shell
74+
$ kubectl get machines
75+
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
76+
powervs-control-plane-pqnt4 powervs ibmpowervs://osa/osa21/10b1000b-da8d-4e18-ad1f-6b2a56a8c130/bc0c9621-12d2-47f1-932e-a18ff041aba2 Provisioned 5m36s v1.31.0
77+
```
78+
79+
* Verify that the ClusterResourceSet is applied to the workload cluster
80+
81+
```shell
82+
$ kubectl get clusterresourceset
83+
NAME AGE
84+
crs-cloud-conf 10m
85+
86+
$ kubectl describe clusterresourceset crs-cloud-conf
87+
.
88+
.
89+
Status:
90+
Conditions:
91+
Last Transition Time: 2025-05-06T08:36:40Z
92+
Message:
93+
Observed Generation: 1
94+
Reason: Applied
95+
Status: True
96+
Type: ResourcesApplied
97+
Last Transition Time: 2025-05-06T08:31:27Z
98+
Message:
99+
Observed Generation: 1
100+
Reason: NotPaused
101+
Status: False
102+
Type: Paused
103+
```
104+
105+
* Verify that the CCM resources are created in the workload cluster
106+
107+
* Get the workload cluster kubeconfig
108+
109+
```
110+
$ clusterctl get kubeconfig powervs > workload.conf
111+
```
112+
113+
* Check the CCM daemonset's status
114+
115+
```
116+
$ kubectl get daemonset -n kube-system --kubeconfig=workload.conf
117+
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
118+
ibmpowervs-cloud-controller-manager 2 2 2 2 2 node-role.kubernetes.io/control-plane= 45m
119+
```
120+
121+
* Check the logs of CCM
122+
123+
```
124+
$ kubectl -n kube-system get pods --kubeconfig=workload.conf
125+
ibmpowervs-cloud-controller-manager-472lq 1/1 Running 1 (45m ago) 46m
126+
ibmpowervs-cloud-controller-manager-fw47h 1/1 Running 1 (38m ago) 38m
127+
128+
$ kubectl -n kube-system logs ibmpowervs-cloud-controller-manager-472lq --kubeconfig=workload.conf
129+
I0506 09:23:51.420992 1 ibm_metadata_service.go:206] Retrieving information for node=powervs-control-plane-ftd8j from Power VS
130+
I0506 09:23:51.421003 1 ibm_powervs_client.go:270] Node powervs-control-plane-ftd8j found metadata &{InternalIP:192.168.236.114 ExternalIP:163.68.98.114 WorkerID:001275c5-f454-4944-8419-61c16f16f8b7 InstanceType:s922 FailureDomain:osa21 Region:osa ProviderID:ibmpowervs://osa/osa21/10b1000b-da8d-4e18-ad1f-6b2a56a8c130/001275c5-f454-4944-8419-61c16f16f8b7} from DHCP cache
131+
I0506 09:23:51.421038 1 node_controller.go:271] Update 3 nodes status took 7.03624ms.
132+
```
133+
134+
* Check the cloud-conf config map
135+
136+
```
137+
$ kubectl -n kube-system get cm ibmpowervs-cloud-config -o yaml --kubeconfig=workload.conf
138+
apiVersion: v1
139+
kind: ConfigMap
140+
metadata:
141+
creationTimestamp: "2025-05-06T08:36:39Z"
142+
name: ibmpowervs-cloud-config
143+
namespace: kube-system
144+
resourceVersion: "329"
145+
uid: ae2bd436-0b1e-4534-9c6c-48f717f6f47e
146+
data:
147+
ibmpowervs.conf: |
148+
[global]
149+
version = 1.1.0
150+
[kubernetes]
151+
config-file = ""
152+
[provider]
153+
cluster-default-provider = g2
154+
.
155+
.
156+
```
157+
158+
* Check whether the secret is configured with correct IBM Cloud API key.
159+
160+
```
161+
$ kubectl -n kube-system get secret ibmpowervs-cloud-credential -o yaml --kubeconfig=workload.conf
162+
```
163+
* Check whether the node is initialised correctly and does not have taint `node.cloudprovider.kubernetes.io/uninitialized` taint
164+
165+
```shell
166+
$ kubectl get nodes --kubeconfig=workload.conf
167+
NAME STATUS ROLES AGE VERSION
168+
powervs-control-plane-ftd8j NotReady control-plane 53m v1.31.0
169+
powervs-control-plane-pqnt4 NotReady control-plane 61m v1.31.0
170+
powervs-md-0-2dnrm-8658c NotReady <none> 56m v1.31.0
171+
172+
173+
$ kubectl get node powervs-control-plane-ftd8j -o yaml --kubeconfig=workload.conf
174+
apiVersion: v1
175+
kind: Node
176+
metadata:
177+
annotations:
178+
cluster.x-k8s.io/annotations-from-machine: ""
179+
cluster.x-k8s.io/cluster-name: powervs
180+
cluster.x-k8s.io/cluster-namespace: default
181+
cluster.x-k8s.io/labels-from-machine: ""
182+
cluster.x-k8s.io/machine: powervs-control-plane-ftd8j
183+
cluster.x-k8s.io/owner-kind: KubeadmControlPlane
184+
cluster.x-k8s.io/owner-name: powervs-control-plane
185+
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
186+
node.alpha.kubernetes.io/ttl: "0"
187+
volumes.kubernetes.io/controller-managed-attach-detach: "true"
188+
```
189+
190+
* On the successful CCM initialisation the machine will turn into Running phase and corresponding NODENAME field will be populated.
191+
```shell
192+
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
193+
powervs-control-plane-pqnt4 powervs powervs-control-plane-pqnt4 ibmpowervs://osa/osa21/10b1000b-da8d-4e18-ad1f-6b2a56a8c130/bc0c9621-12d2-47f1-932e-a18ff041aba2 Running 8m52s v1.31.0
194+
```

0 commit comments

Comments
 (0)