Skip to content

✨ Clusterctl alpha rollout undo for MachineDeployments #4098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions cmd/clusterctl/client/alpha/rollout.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,16 @@ import (
"sigs.k8s.io/cluster-api/cmd/clusterctl/internal/util"
)

const machineDeployment = "machinedeployment"
const MachineDeployment = "machinedeployment"

var validResourceTypes = []string{machineDeployment}
var validResourceTypes = []string{MachineDeployment}

// Rollout defines the behavior of a rollout implementation.
type Rollout interface {
ObjectRestarter(cluster.Proxy, util.ResourceTuple, string) error
ObjectPauser(cluster.Proxy, util.ResourceTuple, string) error
ObjectResumer(cluster.Proxy, util.ResourceTuple, string) error
ObjectRollbacker(cluster.Proxy, util.ResourceTuple, string, int64) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it required to use int64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the underlying mdutil.Revision() func returns a int64.

}

var _ Rollout = &rollout{}
Expand Down
2 changes: 1 addition & 1 deletion cmd/clusterctl/client/alpha/rollout_pauser.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ import (
// ObjectPauser will issue a pause on the specified cluster-api resource.
func (r *rollout) ObjectPauser(proxy cluster.Proxy, tuple util.ResourceTuple, namespace string) error {
switch tuple.Resource {
case machineDeployment:
case MachineDeployment:
deployment, err := getMachineDeployment(proxy, tuple.Name, namespace)
if err != nil || deployment == nil {
return errors.Wrapf(err, "failed to fetch %v/%v", tuple.Resource, tuple.Name)
Expand Down
4 changes: 2 additions & 2 deletions cmd/clusterctl/client/alpha/rollout_pauser_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ func Test_ObjectPauser(t *testing.T) {
},
},
tuple: util.ResourceTuple{
Resource: "machinedeployment",
Resource: MachineDeployment,
Name: "md-1",
},
namespace: "default",
Expand All @@ -81,7 +81,7 @@ func Test_ObjectPauser(t *testing.T) {
},
},
tuple: util.ResourceTuple{
Resource: "machinedeployment",
Resource: MachineDeployment,
Name: "md-1",
},
namespace: "default",
Expand Down
2 changes: 1 addition & 1 deletion cmd/clusterctl/client/alpha/rollout_restarter.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ import (
// ObjectRestarter will issue a restart on the specified cluster-api resource.
func (r *rollout) ObjectRestarter(proxy cluster.Proxy, tuple util.ResourceTuple, namespace string) error {
switch tuple.Resource {
case "machinedeployment":
case MachineDeployment:
deployment, err := getMachineDeployment(proxy, tuple.Name, namespace)
if err != nil || deployment == nil {
return errors.Wrapf(err, "failed to fetch %v/%v", tuple.Resource, tuple.Name)
Expand Down
4 changes: 2 additions & 2 deletions cmd/clusterctl/client/alpha/rollout_restarter_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ func Test_ObjectRestarter(t *testing.T) {
},
},
tuple: util.ResourceTuple{
Resource: "machinedeployment",
Resource: MachineDeployment,
Name: "md-1",
},
namespace: "default",
Expand All @@ -83,7 +83,7 @@ func Test_ObjectRestarter(t *testing.T) {
},
},
tuple: util.ResourceTuple{
Resource: "machinedeployment",
Resource: MachineDeployment,
Name: "md-1",
},
namespace: "default",
Expand Down
2 changes: 1 addition & 1 deletion cmd/clusterctl/client/alpha/rollout_resumer.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ import (
// ObjectResumer will issue a resume on the specified cluster-api resource.
func (r *rollout) ObjectResumer(proxy cluster.Proxy, tuple util.ResourceTuple, namespace string) error {
switch tuple.Resource {
case "machinedeployment":
case MachineDeployment:
deployment, err := getMachineDeployment(proxy, tuple.Name, namespace)
if err != nil || deployment == nil {
return errors.Wrapf(err, "failed to fetch %v/%v", tuple.Resource, tuple.Name)
Expand Down
4 changes: 2 additions & 2 deletions cmd/clusterctl/client/alpha/rollout_resumer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ func Test_ObjectResumer(t *testing.T) {
},
},
tuple: util.ResourceTuple{
Resource: "machinedeployment",
Resource: MachineDeployment,
Name: "md-1",
},
namespace: "default",
Expand All @@ -84,7 +84,7 @@ func Test_ObjectResumer(t *testing.T) {
},
},
tuple: util.ResourceTuple{
Resource: "machinedeployment",
Resource: MachineDeployment,
Name: "md-1",
},
namespace: "default",
Expand Down
168 changes: 168 additions & 0 deletions cmd/clusterctl/client/alpha/rollout_rollbacker.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
/*
Copyright 2020 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package alpha

import (
"context"

"github.com/pkg/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
clusterv1 "sigs.k8s.io/cluster-api/api/v1alpha4"
"sigs.k8s.io/cluster-api/cmd/clusterctl/client/cluster"
"sigs.k8s.io/cluster-api/cmd/clusterctl/internal/util"
logf "sigs.k8s.io/cluster-api/cmd/clusterctl/log"
"sigs.k8s.io/cluster-api/controllers/mdutil"
"sigs.k8s.io/cluster-api/util/patch"
"sigs.k8s.io/controller-runtime/pkg/client"
)

// ObjectRollbacker will issue a rollback on the specified cluster-api resource.
func (r *rollout) ObjectRollbacker(proxy cluster.Proxy, tuple util.ResourceTuple, namespace string, toRevision int64) error {
switch tuple.Resource {
case MachineDeployment:
deployment, err := getMachineDeployment(proxy, tuple.Name, namespace)
if err != nil || deployment == nil {
return errors.Wrapf(err, "failed to get %v/%v", tuple.Resource, tuple.Name)
}
if deployment.Spec.Paused {
return errors.Errorf("can't rollback a paused MachineDeployment: please run 'clusterctl rollout resume %v/%v' first", tuple.Resource, tuple.Name)
}
if err := rollbackMachineDeployment(proxy, deployment, toRevision); err != nil {
return err
}
default:
return errors.Errorf("invalid resource type %q, valid values are %v", tuple.Resource, validResourceTypes)
}
return nil
}

// rollbackMachineDeployment will rollback to a previous MachineSet revision used by this MachineDeployment.
func rollbackMachineDeployment(proxy cluster.Proxy, d *clusterv1.MachineDeployment, toRevision int64) error {
log := logf.Log
c, err := proxy.NewClient()
if err != nil {
return err
}

if toRevision < 0 {
return errors.Errorf("revision number cannot be negative: %v", toRevision)
}
msList, err := getMachineSetsForDeployment(proxy, d)
if err != nil {
return err
}
log.V(7).Info("Found MachineSets", "count", len(msList))
msForRevision, err := findMachineDeploymentRevision(toRevision, msList)
if err != nil {
return err
}
log.V(7).Info("Found revision", "revision", msForRevision)
patchHelper, err := patch.NewHelper(d, c)
if err != nil {
return err
}
// Copy template into the machinedeployment (excluding the hash)
revMSTemplate := *msForRevision.Spec.Template.DeepCopy()
delete(revMSTemplate.Labels, mdutil.DefaultMachineDeploymentUniqueLabelKey)

d.Spec.Template = revMSTemplate
return patchHelper.Patch(context.TODO(), d)
}

// findMachineDeploymentRevision finds the specific revision in the machine sets
func findMachineDeploymentRevision(toRevision int64, allMSs []*clusterv1.MachineSet) (*clusterv1.MachineSet, error) {
var (
latestMachineSet *clusterv1.MachineSet
latestRevision = int64(-1)
previousMachineSet *clusterv1.MachineSet
previousRevision = int64(-1)
)
for _, ms := range allMSs {
if v, err := mdutil.Revision(ms); err == nil {
if toRevision == 0 {
if latestRevision < v {
// newest one we've seen so far
previousRevision = latestRevision
previousMachineSet = latestMachineSet
latestRevision = v
latestMachineSet = ms
} else if previousRevision < v {
// second newest one we've seen so far
previousRevision = v
previousMachineSet = ms
}
} else if toRevision == v {
return ms, nil
}
}
}

if toRevision > 0 {
return nil, errors.Errorf("unable to find specified revision: %v", toRevision)
}
Comment on lines +115 to +117
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is added to provide a better error msg because I see that the next check would also result in returning an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the specific revision desired was not found. It's different from the next error which is that there is no previous MS associated with this deployment.


if previousMachineSet == nil {
return nil, errors.Errorf("no rollout history found")
}
return previousMachineSet, nil

}

// getMachineSetsForDeployment returns a list of MachineSets associated with a MachineDeployment.
func getMachineSetsForDeployment(proxy cluster.Proxy, d *clusterv1.MachineDeployment) ([]*clusterv1.MachineSet, error) {
log := logf.Log
c, err := proxy.NewClient()
if err != nil {
return nil, err
}
// List all MachineSets to find those we own but that no longer match our selector.
machineSets := &clusterv1.MachineSetList{}
if err := c.List(context.TODO(), machineSets, client.InNamespace(d.Namespace)); err != nil {
return nil, err
}

filtered := make([]*clusterv1.MachineSet, 0, len(machineSets.Items))
for idx := range machineSets.Items {
ms := &machineSets.Items[idx]

// Skip this MachineSet if its controller ref is not pointing to this MachineDeployment
if !metav1.IsControlledBy(ms, d) {
log.V(5).Info("Skipping MachineSet, controller ref does not match MachineDeployment", "machineset", ms.Name)
continue
}

selector, err := metav1.LabelSelectorAsSelector(&d.Spec.Selector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move out of the for loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion with Warren below. :)

if err != nil {
log.V(5).Info("Skipping MachineSet, failed to get label selector from spec selector", "machineset", ms.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is getting the label selector from the deployment, I'm assuming that the error is meant to reflect just that.

Suggested change
log.V(5).Info("Skipping MachineSet, failed to get label selector from spec selector", "machineset", ms.Name)
log.V(5).Info("Skipping MachineSet, failed to get label selector from MachineDeployment.Spec.Selector", "MachineDeployment", d.Name)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, can this and the following selector.Empty() checks be moved outside the loop? The MachineDeployment is fixed, so should we iterate over the MachineSet items only if we have a non-empty selector.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually leveraged existing code here:

selector, err := metav1.LabelSelectorAsSelector(&d.Spec.Selector)

But I see your point, I don't see why the selector should be in the loop. I'll change it unless you see any reason otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason TBH. Maybe someone else may know why.
@vincepri Any thoughts?

continue
}
// If a MachineDeployment with a nil or empty selector creeps in, it should match nothing, not everything.
if selector.Empty() {
log.V(5).Info("Skipping MachineSet as the selector is empty", "machineset", ms.Name)
continue
}
// Skip this MachineSet if selector does not match
if !selector.Matches(labels.Set(ms.Labels)) {
log.V(5).Info("Skipping MachineSet, label mismatch", "machineset", ms.Name)
continue
}
filtered = append(filtered, ms)
}

return filtered, nil
}
Loading