-
Notifications
You must be signed in to change notification settings - Fork 12
Recommended limits for created/running jobs #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't think this is a good idea. (But I'm also not a fan of the client-side approach to create hundres of jobs in the job manager. Shouldn't that be one job? It seems like a back-end limitation that is exposed to the user.) For example, an implementation in the Web Editor that blocks a submission due to capacity limits would often not be up-to-date due to the request interval. So you could in theory already submit something, the UI just hasn't received up-to-date data. Additionally, if pagination is active, the Editor may not even know how many jobs are active (assuming we just iterate through jobs). Otherwise, you probably need separate statistics of active jobs as part of I think the try and error approach here is okay. To ensure up-to-date limits you need to make a request in anyway, we'd just move it to another endpoint. So I'm not sure what we gain. Users could also just be informed about limits in other ways, e.g. the backend description, and then configure the job manager manually with those limits. Generally, we tried to avoid defining limits too specifically because backends could have limits in so many different ways, that we probably can't think of all of them and in the end it could be an endless list of options. For example, someone may combined limits for sync and batch job, yet another property to add... |
We handle a lot of use cases where multi-job management is a important requirement. These users don't want a single giant job that would take weeks/months to finish, they want multiple more manageable jobs that finish within reasonable time, where results can easily be inspected on the go. They want to scale up/down their load to manage credit consumption, re-run where necessary, ... It's true that it would be nice that this kind of functionality would be provided by openEO, but that's completely not the case yet. And we already tried to experiment with "large area" processing and automatic job splitting at the level of the aggregator, but there are so many aspects and details to that, that it is just easier and flexible to just do the whole management from the client side. In the long term these ideas could/should certainly ported to a backend component, but it's just too early as we are still exploring this space.
It's true that you can get in race condition troubles when client and server are bit out of sync, but that doesn't mean this information is worthless. That's like saying that an email client should not report the number of unread messages because it could be off from time to time. And part of the proposal is also about numbers that almost never change like "maximum number of concurrently running batch/sync jobs".
I'm not sure what you mean here, because this proposal is about the backend stating numbers/limits, not about clients guessing them.
It's true that you still would have to try and catch errors in the end. But users can be pretty panicky when errors pop up (even when shown as warning). And likewise, as a backend, we also monitor 400/500 HTTP responses to get an idea about our service health. It's not ideal that these error stats would be polluted by clients that are just pushing their luck because there is no other way to detect limits.
Ok makes sense and I understand that we don't want to predefine all possible limits at the level of the openEO API. But I still think it's valuable to at least standardize the place/endpoint where they can be found and consumed programmatically. |
To make something useful with the limit, you'd need to also know the number of active operations. Where do you get this information from? |
This is something that pops up regularly while working on client-side job managers: how many jobs can a user create, how many jobs can run in parallel, ... ?
At the moment, we have in VITO projects some adhoc and per-user configs in the backend and user scripts to steer job managers that create and start tens/hundreds of jobs, but that involves poorly documented and non-standard aligning of various tools.
I think it makes sense to add something to the openEO API that allows backends to expose global or per-user capacity/limits for the number of created jobs, number of concurrently running jobs, etc. That would allow clients to handle this in a cleaner and more transparent way. With the current API, the only official "UI" is basically: just try starting jobs until you get an error, and make sure to backoff/retry properly in some sense.
To give a bit an idea about what I think could be covered here, a non-exhaustive list of things that could be included:
These numbers would be just recommendations to follow for clients/tools that support it. Going over limits would just trigger the errors we already have.
I'm not sure yet what would be a good place to expose:
GET /
GET /jobs
and related?Note that this would also be interesting in a federation context to steer job distribution.
The text was updated successfully, but these errors were encountered: