refactor updatequeuejob thread logic in informer#624
refactor updatequeuejob thread logic in informer#624openshift-merge-robot merged 11 commits intoproject-codeflare:mainfrom
Conversation
| requeueInterval := 30 * time.Second | ||
| key, err := cache.MetaNamespaceKeyFunc(qj) | ||
| if err == nil { | ||
| go func() { |
There was a problem hiding this comment.
time.AfterFunc seems a better construct to use here rather than go func() + time.Sleep.
On a side node, it seems it'll be valuable to move to used a delaying queue.
There was a problem hiding this comment.
We need to wake the thread up every so often, from what I understand time.AfterFunc would wake up only once.
There was a problem hiding this comment.
Right, maybe requeuing within the function passed to time.AfterFunc (possibly calling time.AfterFunc)?
| if latestObj != nil { | ||
| latestAw = latestObj.(*arbv1.AppWrapper) | ||
| } else { | ||
| latestAw = qj |
There was a problem hiding this comment.
I wonder what happens here to qj when we leave the scope of this function. This seems really dangerous. This for loop is going to keep retrying qj outside of addQueueJob since it is a thread. Does it retain a copy of qj? I'm not saying we should change it, I'm just concerned.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: metalcycling The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Issue link
#602
What changes have been made
Removed independent updatequeuejob thread that looped through all the AWs in etcd to determine completion status and update pod count. the fix adds updatequeuejob logic to the informer's machinery which has promise to improve MCADs updatestatus performance at scale.
Verification steps
I have verified that the fix works manually.
Checks