-
-
Notifications
You must be signed in to change notification settings - Fork 732
Add more threading configs #9076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ±0 27 suites ±0 11h 23m 43s ⏱️ + 15m 23s For more details on these failures and errors, see this check. Results for commit ce38c35. ± Comparison against base commit 39f91e3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does
threads_per_worker
inLocalCluster
mean?
Each worker is a separate process. Each worker process can have multiple threads. In use cases where the task releases the GIL it can be beneficial to use many threads because the inter-thread communication cost is lower. In cases where the tasks aren't threadsafe or don't release the GIL is can be better to have 1 thread with many processes.
By default we use a balanced profile where we look at how many CPU cores you have and then create a few processes, each with a few threads. The product of processes x threads == CPU cores
. This doesn't favour any workflow type in particular.
We set these variables to 1
because we assume Dask will be running one task per thread/process, and the product of that configuration will be the total CPU cores in your system. So if you have 12 cores Dask will run 12 tasks in parallel. So we don't want the libraries they call like Numpy to try and use more than 1 core, otherwise we will have too much CPU contention.
Ultimately these things come down to tuning for your specific workflows. You tweak these numbers when you are debugging or trying to squeeze more performance out. If you are finding a benefit from setting the variables you mention here for your workflow that's great. I'm not sure I see the value in setting this default for all Dask users though unless there is a clear problem that many people are running into.
We've also seen a bit of fallout with these settings for users who are intentionally running dask with one thread per worker, assuming that the lib underneath parallelizes just to realize that this isn't happening by default due to these settings. Because of this, I'm generally not enthusiastic about these settings. While some users may benefit from it, it is a surprising side effect when running things inside of a dask worker. (I'm not blocking, just adding context) |
So that means Python threads that don’t actually have any benefit apart from I/O? |
Absolutely! In this case you want to set the number of threads and processes per machine to This isn't something we should be setting as the default for all users though. So I'm going to close this out. |
That’s not what I meant. setting it to 1 is vastly preferable to having it implicitly be If I run a numba function in I think this should be re-opened, but maybe I missed something. I don’t have my head wrapped around this fully, but I’m getting there. |
Big question: is this even correct? Dasks’s docs mention the three env variables mentioned here, but why “1”?
What does
threads_per_worker
inLocalCluster
mean?Ideally I’d start my workers, run 1 Python thread in each of them, and configure each of them so all these parallelization engines use a certain number of threads.
Closes #9075
pre-commit run --all-files
no tests, since you also don’t test
OMP_NUM_THREADS
and the other documented env variables.