-
Notifications
You must be signed in to change notification settings - Fork 394
Description
What happened:
I do not know how exactly it happened, but I got into a situation when the snapshot controller was processing a VolumeGroupSnapshot and corresponding VolumeSnapshot already existed. The controller reacts to 409 Already Exists
here:
external-snapshotter/pkg/common-controller/groupsnapshot_controller_helper.go
Lines 626 to 630 in 59d7297
createdVolumeSnapshot, err := ctrl.clientset.SnapshotV1().VolumeSnapshots(volumeSnapshotNamespace).Create(ctx, volumeSnapshot, metav1.CreateOptions{}) | |
if err != nil && !apierrs.IsAlreadyExists(err) { | |
return groupSnapshotContent, fmt.Errorf( | |
"createSnapshotsForGroupSnapshotContent: creating volumesnapshot %w", err) | |
} |
You can see the code continues processing when the snapshot already exists. However, the createdVolumeSnapshot
variable has undefined content (it points to an object with empty namespace / name in my case) and later on, when the controller tries to update it, Patch()
fails here:
_, err = utils.PatchVolumeSnapshot(createdVolumeSnapshot, []utils.PatchOp{ |
Error:
E0310 21:18:31.250917 1 groupsnapshot_controller_helper.go:257] could not sync group snapshot "e2e-volumegroupsnapshottable-8755/group-snapshot-z6w49": createSnapshotsForGroupSnapshotContent: binding volumesnapshot to volumesnapshotcontent resource name may not be empty
What you expected to happen:
The controller should perhaps Get()
the VolumeSnapshot from its informer before creating it. And if it already exists, then fail + expect the VolumeSnapshot appears in the informer in the next retry.
All IsAlreadyExists
should be then handled in a similar way, not just VolumeSnapshot.
How to reproduce it:
I don't know. It happened only once in OpenShift e2e tests. I am stress-testing the test, 1024 runs so far, 0 failures
.
Logs from OpenShift e2e tests:
- snapshot-controller (look for
binding volumesnapshot to volumesnapshotcontent resource name may not be empty
) - CSI driver with external-snapshotter
- e2e test (look for
VolumeSnapshot group-snapshot-z6w49 found but is not ready
)
As you can see, it would be helpful to log names of created VolumeSnapshots, VolumeSnapshotContents, and VolumeGroupContents - it's quite hard to map which VolumeSnapshot failed to be patched and what's the corresponding VGSC.
Anything else we need to know?:
Environment:
- Driver version: csi-driver-hostpath + e2e tests as in Kubernetes 1.32
- Kubernetes version (use
kubectl version
): 1.32-ish