-
Notifications
You must be signed in to change notification settings - Fork 235
fix!: properly shutdown controller manager and dynamic controller #616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix!: properly shutdown controller manager and dynamic controller #616
Conversation
…hutdown correctly previously the informer queue of the dynamic controller and the manager shut-down immediately (killed). This is problematic because it means that a reconcile loop can potentially be interrupted in the middle of a run even though we have a normally graceful shutdown. This fixes this by adding the dynamic controller into the manager runnables instead of leaking a goroutine as well as fixing how the manager is actually stopped (previously the manager was run in a goroutine). Additionally we now properly shutdown the queue of the dynamic controller and wait for the workers to finish processing. Note that if this process still takes longer than the graceful shutdown period, we forcefully kill the manager and the binary. Because the shutdown timeout is now respected globally, I have replaced `--dynamic-controller-default-shutdown-timeout` and `KRO_DYNAMIC_CONTROLLER_DEFAULT_SHUTDOWN_TIMEOUT` with `--graceful-shutdown-timeout` and `KRO_GRACEFUL_SHUTDOWN_TIMEOUT` because the shutdown timeout is now handled globally within the manager and not individually just for the dynamic controller. Signed-off-by: Jakob Möller <jakob.moeller@sap.com>
"interval at which the controller will re list resources even with no changes, in seconds") | ||
flag.IntVar(&queueMaxRetries, "dynamic-controller-default-queue-max-retries", 20, | ||
"maximum number of retries for an item in the queue will be retried before being dropped") | ||
flag.IntVar(&shutdownTimeout, "dynamic-controller-default-shutdown-timeout", 60, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably call this out in the release notes since it's breaking, especially given we've also changed the name of the parameter in the Helm chart.
@a-hilaly do we have a good process to flag something that should be highlighted in the release notes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we would use something like labels and conventional commits, we could make it so that something like fix!: myfix
would mark the breaking change with a !
(see https://www.conventionalcommits.org/en/v1.0.0/#summary). but idk if thats too restrictive. apart from that it would be good to mark it somehow manually with a label and then be able to generate release notes from that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-hilaly do we have a good process to flag something that should be highlighted in the release notes?
Not really, so far we manually flagged things. We do use https://github.com/softprops/action-gh-release for github releases and at first glance looks like they parse conventional commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I marked it right now and will pay attention on the next release, if the notes get generated correctly, all good, then it should just become part of the contributor guide
As far as I am concerned this is good to merge. @a-hilaly if you want to take another look that's fine by me, but otherwise my vote is merge? I'm out of office all next week but when I'm back if no more comments I'll merge this. Otherwise feel free to merge while I am out as well. |
…fecycle # Conflicts: # cmd/controller/main.go
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jakobmoellerdev, matthchr The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
previously the informer queue of the dynamic controller and the manager shut-down immediately (got PID killed).
This is problematic because it means that a reconcile loop can potentially be interrupted in the middle of a run even though we have a normally graceful shutdown.
Fortunately controller runtime provides a nice way to fix this: by adding the dynamic controller into the manager runnables instead of leaking a goroutine as well as fixing how the manager is actually stopped (previously the manager was run in a goroutine).
Additionally we now properly shutdown the queue of the dynamic controller and wait for the workers to finish processing.
Note that if this process still takes longer than the graceful shutdown period, we forcefully kill the manager and the binary.
Because the shutdown timeout is now respected globally, I have replaced
--dynamic-controller-default-shutdown-timeout
with--graceful-shutdown-timeout
, andKRO_DYNAMIC_CONTROLLER_DEFAULT_SHUTDOWN_TIMEOUT
withKRO_GRACEFUL_SHUTDOWN_TIMEOUT
This is because the shutdown timeout is now handled globally within the manager and not individually just for the dynamic controller. A nice side-effect is that this shutdown timeout now applies globally to other controllers as well and not just the dynamic controller
An example log now when the process receives a shutdown signal: