-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Common segfault in restore_og
when running SWOOLE_PROCESS
server.
#5761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please try to trace it with Valrind USE_ZEND_ALLOC=0 valgrind php your_code.php |
Yes i would love to be able to do this, this only happens in production though, i am thinking of doing a custom build with ultra-specific logging to minimize performance impact, if you had any specific things you wanted to track or enable i could do that. At a high level it appears that the I'll try to create a minimal repo of this issue, do you know of any way we could see what PHP functions the coroutines are based on, that might help me be able to do that. On a side note i am running all our test instances with valgrind as you suggested above, i'll see what i can do to replicate the issue. |
I have been able to confirm the issue to be an use after free from I did this by explicitly overwriting the struct with I am still open to ideas on how to trace this in a minimal invasive way as the issue still only has occurred in our production systems. Edit: Some more context, most likely the issue occurs then a coroutine (A) points to an |
Uh oh!
There was an error while loading. Please reload this page.
Hi, over the last couple of days i've seen a huge influx in worker processes that dies due to the same segmentation fault. This has sadly only been happening in our production systems and not in testing yet, but i've managed to capture 2 core dumps over 1 hour using swoole compiled with debug symbols. We are running
swoole 6.0.2
withphp 8.4.7
on linux6.8.0-58-generic
under docker in alpine3.21
, and i've provided a Dockerfile environment that matches the exact environment of the application.Here is an except of the issue:
It appears that the PHPContext
output_ptr
gets corrupted somehow, but i am not able to run more intrusive debugging tools such as xdebug or valgrind since its production, but any ideas are welcome. This happens consistently - we are not using any native php modules that should impact this and have in periods experienced up to 20 of these faults in a single day.To interactively inspect the core dumps use the following docker file as a base environment:
With GDB inside this container open the core dump with
gdb -c [COREDUMPFILE]
and useinfo sharedlibrary
to get the start offset ofswoole.so
(will be something like0x000073bf19013900
).With this offset load the provided
20240924-debug-swoole.so
file withadd-symbol-file 20240924-debug-swoole.so [OFFSET]
and choosey
to read the symbol file. You now have the correct debug symbols to inspect the call-stack etc.A quick set of commands to get testing fast:
I am hosting the associated .so and coredump files.
https://my-files.javad.sh/dumps/20240924-debug-swoole.so.gz
The core dump files have been redacted for all sensitive environment variables etc. but i would prefer sharing them privately, or with GPG, for convenience i've GPG'ed them for the swoole maintainers with public GPG keys, but please provide an email address or a GPG-key you'd like to receive them with.
info.gpg.txt
The text was updated successfully, but these errors were encountered: