Skip to content

Hv issue719 job manager threaded job start #736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: master
Choose a base branch
from

Conversation

HansVRP
Copy link
Contributor

@HansVRP HansVRP commented Feb 19, 2025

No description provided.

@HansVRP HansVRP requested a review from soxofaan February 19, 2025 15:55
@HansVRP
Copy link
Contributor Author

HansVRP commented Feb 19, 2025

@soxofaan maybe good to rediscuss from this point (local unit tests are passing)

Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick notes

@soxofaan
Copy link
Member

soxofaan commented Feb 20, 2025

by the way, I don't think you had to create a new PR, you just could have continued on #730 by pushing to its feature branch (we now have three open PRs for this feature I think, which gets a bit messy)

@HansVRP HansVRP requested a review from soxofaan February 21, 2025 14:38
@HansVRP
Copy link
Contributor Author

HansVRP commented Feb 21, 2025

worked on your initial comments @soxofaan

currently checking why some unit tests are not passing

@soxofaan
Copy link
Member

@soxofaan I do not think the failing unit test in 3.11 are due to my changes?

indeed, that problem has been fixed on master by now

@HansVRP
Copy link
Contributor Author

HansVRP commented Feb 25, 2025

One issue I still see now and want to resolve is that currently launching jobs (to queue them for start) is still tied in to the backend load.

This means than when running at full capacity you will not already 'precreate jobs' which can instantly start running

@soxofaan
Copy link
Member

when running at full capacity you will not already 'precreate jobs' which can instantly start running

I'm not sure that pre-creating jobs instead of on the fly like we currently do will make that big of a difference as the time to create a job is usually negligible compared to the required time to start a job. Or do you experience different timings?

@HansVRP
Copy link
Contributor Author

HansVRP commented Mar 20, 2025

@soxofaan not sure why these test are failing, but I think the code is at a good point to reevaluate.

Uncertain wheter we need to post process after shutting down the worker pool, given the

while (
sum(
job_db.count_by_status(
statuses=["not_started", "created", "queued", "queued_for_start", "running"]
).values()

loop.

As long if we have not started or queued for start states (on which the postprocessing would touch, we remain in the loop)

@soxofaan
Copy link
Member

soxofaan commented Mar 21, 2025

Uncertain wheter we need to post process after shutting down the worker pool, given the

good point, however, that probably only works out now with doing the start in a side thread. Once we add threaded result downloading or other features, that while(sum(...)) is not going to guarantee that all the (side) work is done yet.

Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick notes

Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more notes

@HansVRP
Copy link
Contributor Author

HansVRP commented Apr 18, 2025

@soxofaan ready for review

@HansVRP
Copy link
Contributor Author

HansVRP commented Apr 18, 2025

Ran a small stress test for 30 short lived jobs (10 parallel jobs).

image

Total time (standard): 2728.00 seconds
Total time (threaded): 2147.00 seconds
Total time gain: 581.00 seconds (21.30% faster)

So the time between creating a job and running a job became 20% shorter. This does need to put in perspective that these gains are small vs the actual duration of entire openEO jobs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants