-
Notifications
You must be signed in to change notification settings - Fork 6.2k
[core] Fix "Fatal Python error: PyGILState_Release: auto-releasing thread-state, but no thread-state for this thread" #52575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
What will the behavior be if a user does try to access thread-local state currently? There are really only two acceptable options here IMO unless there is a significant:
Is there a reason why we can't fix the underlying problem and ensure that the initializer is run once per thread and the same thread runs the corresponding release call? Naiively it seems like this should only require keeping a map of |
As I mentioned above, |
Got it, in that case it sounds like we'll need to use the lower-level APIs to manage our own basic thread pool and implement the init/release logic. This should not be too challenging given how simple the usage in |
I think it's fine not to support thread-local state for concurrency groups with more than one thread. I remember discussing this behavior with @stephanie-wang several months ago when we were trying to move the RayCG execution loop to the main thread. However, executing the initializer is not only for thread-local state; it also aligns Ray more closely with the Python interpreter's assumptions. That is, once a thread with a given thread ID exits, it cannot be restarted. If we want to run the initializer/releaser on each thread, we may need to get rid of the |
I just saw #52575 (comment) after I submitted #52575 (comment). Implementing our own thread pool makes sense. I want to confirm with you that the goal of implementing our own thread pool to initialize and release Python threads is not to support thread-local state; rather, it is to fulfill the Python interpreter's assumptions, as I mentioned in the previous comment. Users should still not use thread-local state for a concurrency group with multiple threads because of the user interface issue mentioned in #52575 (comment). |
Yes exactly. I agree we should not encourage users to do this, but we should fulfill the Python interpreter's assumptions. This will also avoid undefined behavior and/or scary stack traces like the one in this ticket. As an example, there might be library code that uses thread local storage that users aren't even aware of. We would want to make sure that the code at least runs correctly and doesn't fail in unexpected & confusing ways. |
@edoakes Is it okay to implement our own thread pool using a naive round-robin approach? If not, I’d prefer to merge this PR first, and then I can follow up with another PR to implement it after on-call. I took a look at the source code of the post function in boost::asio::thread_pool. It's not trivial if we plan to implement the scheduler by ourselves. |
The use case here is quite simple and the work is coarse-grained (task executions). We should be able to use an Psuedocode:
|
Why are these changes needed?
We see the following error message from the CI runs of
test_threaded_actor.py
(example1, example2).The message "Fatal Python error: PyGILState_Release: auto-releasing thread-state, but no thread-state for this thread" is very scary, but it will not cause any tests to fail.
The root cause is that
PyGILState_Release
is called on a thread that has never calledPyGILState_Ensure
. See the CPython source code for more details.The reason is that we can't control which thread in the thread pool will run the initializer/releaser. Hence, if a concurrency group has more than one thread, the error message above may be printed when we gracefully shut down an actor (i.e.,
ray.actor.exit_actor()
).In this PR, we only execute the initializer and releaser when the executor has only one thread, to ensure that both run on the same thread. This means that users cannot access thread-local state when a concurrency group has more than one thread. I think this behavior is acceptable because users cannot control which thread executes a task, so they should not rely on thread-local state when a concurrency group has more than one thread.
Related issue number
Closes #51071
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.