-
Notifications
You must be signed in to change notification settings - Fork 41
more flexible job manager end state #763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure I understand what you mean. The "internal queuing" feature is just an internal backend thing by design, I don't think there is anything required client-side to support that. Something that might be possible however, is to have a standard API to discover and leverage job submission limits as discussed at |
Will create a minimal example to reproduce the issue |
narrowed down the issue; It comes from the try except loop in the PR: #736 `def execute(self) -> _TaskResult:
Failed to start job j-2504220752104722b90406957695f315: [429] Too Many Requests --> We need to avoid labeling too many request errors as start_failed and instead handle those jobs as 'created' |
With the new internal queue; jobs are automatically retried incase more jobs are created that the amount of allowed parallel jobs.
Since the job manager runs until all jobs end in finalized, start failed or error, it doe snot support the internal queueing.
Ideally we would build in some flexibility that allows the user to submit and track more parallel jobs than those supported with their standard account.
Can we make the 'end condition' on start_failed more flexible while not risking an endless loop?
The text was updated successfully, but these errors were encountered: