Skip to content

Conversation

jhaigh0
Copy link
Contributor

@jhaigh0 jhaigh0 commented Jan 14, 2025

Description of work

Adds the recovery of C++ stack trace information in the event of a crash on linux.

  • Launching workbench has been changed to use subprocess.Popen (rather than multiprocessing) to make use of the pre_exec_func parameter. This is passed a function which turns on core dumps for the workbench process.
  • The errorreports.core_dumps will need to be set for this feature to work. This is a path to the directory in which your linux system places core dump files. I couldn't find a reliable method to determine this automatically, an alternative would be to set sysctl -w kernel.core_pattern=/tmp/mantid_core_dumps in the pre-exec function. Couldn't find a python exposed way of doing this though (other than calling subprocess.run)
  • The core dump file is analysed using the pystack library which can recover trace backs from all active threads.
  • Since pystack produces a lot of output, the data is sent compressed to the error reporter.
  • On IDAaaS, the core dump files produced are lz4 compressed, so that case has been considered.

Fixes #38405

Further detail of work

The error reporter won't know about this change until mantidproject/errorreports#60 is merged since the C++ traces are assigned to a new variable. Either can be merged first safely.

On a built version, pystack won't recover the file names of out C++ code, but it will recover the function names and call stacks.

To test:

https://builds.mantidproject.org/job/build_packages_from_branch/1068/artifact/

conda install jhaigh0/label/core_dump_test::mantidworkbench (make sure it is 6.12.20250325.1517-py310hd83431c_0)

  • In your user properties file, set errorreports.core_dumps to be where core dumps end up on your system. If it's ubuntu then it's probably /var/lib/apport/coredump (On IDAaaS /var/lib/systemd/coredump and you want to set it in /etc/mantid.local.properties probably)
  • Launch workbench, then run Segfault to crash.
  • There should be some logging output that makes sense.
  • Once the error reporter appears, clicking on more information you should be able to see the output from pystack. There will be a lot of different threads, but one of them will have Segfault::exec()

Reviewer

Please comment on the points listed below (full description).
Your comments will be used as part of the gatekeeper process, so please comment clearly on what you have checked during your review. If changes are made to the PR during the review process then your final comment will be the most important for gatekeepers. In this comment you should make it clear why any earlier review is still valid, or confirm that all requested changes have been addressed.

Code Review

  • Is the code of an acceptable quality?
  • Does the code conform to the coding standards?
  • Are the unit tests small and test the class in isolation?
  • If there is GUI work does it follow the GUI standards?
  • If there are changes in the release notes then do they describe the changes appropriately?
  • Do the release notes conform to the release notes guide?

Functional Tests

  • Do changes function as described? Add comments below that describe the tests performed?
  • Do the changes handle unexpected situations, e.g. bad input?
  • Has the relevant (user and developer) documentation been added/updated?

Does everything look good? Mark the review as Approve. A member of @mantidproject/gatekeepers will take care of it.

Gatekeeper

If you need to request changes to a PR then please add a comment and set the review status to "Request changes". This will stop the PR from showing up in the list for other gatekeepers.

@jhaigh0 jhaigh0 force-pushed the 38405_capture_cpp_stacktraces_from_core_dumps branch from e3ba601 to 8569ea0 Compare January 14, 2025 16:28
@jhaigh0 jhaigh0 added this to the Release 6.12 milestone Jan 16, 2025
@jhaigh0 jhaigh0 force-pushed the 38405_capture_cpp_stacktraces_from_core_dumps branch 2 times, most recently from b0238c0 to 83de376 Compare January 16, 2025 13:40
@sf1919
Copy link
Contributor

sf1919 commented Jan 17, 2025

The release note structure for v6.13 has been created. You can now rebase your branch and move the release note in this PR.

@jhaigh0 jhaigh0 force-pushed the 38405_capture_cpp_stacktraces_from_core_dumps branch 3 times, most recently from 272df8c to cbc3bdd Compare January 24, 2025 09:45
@jhaigh0 jhaigh0 force-pushed the 38405_capture_cpp_stacktraces_from_core_dumps branch 5 times, most recently from 8b6f084 to 2bbc225 Compare February 17, 2025 12:31
@jhaigh0 jhaigh0 added ISIS: Core Issue and pull requests at ISIS that relate to Core features and functions Epic Used for issues and PRs that are managed under the ISIS Epic Workstream labels Feb 17, 2025
@github-actions github-actions bot added the Has Conflicts Used by the bot to label pull requests that have conflicts label Feb 18, 2025
Copy link

👋 Hi, @jhaigh0,

Conflicts have been detected against the base branch. Please rebase your branch against the base branch.

@github-actions github-actions bot removed the Has Conflicts Used by the bot to label pull requests that have conflicts label Feb 19, 2025
@jhaigh0 jhaigh0 marked this pull request as ready for review February 21, 2025 15:22
@sf1919 sf1919 linked an issue Mar 3, 2025 that may be closed by this pull request
@github-actions github-actions bot added the Has Conflicts Used by the bot to label pull requests that have conflicts label Mar 5, 2025
Copy link

github-actions bot commented Mar 5, 2025

👋 Hi, @jhaigh0,

Conflicts have been detected against the base branch. Please rebase your branch against the base branch.

@jhaigh0 jhaigh0 force-pushed the 38405_capture_cpp_stacktraces_from_core_dumps branch from 885c47e to e5d9b18 Compare March 6, 2025 09:04
@github-actions github-actions bot removed the Has Conflicts Used by the bot to label pull requests that have conflicts label Mar 6, 2025
@thomashampson thomashampson added the Extra Attention Testers and Gate keepers should pay extra attention as this affects core aspects. label Mar 14, 2025
Copy link
Contributor

@thomashampson thomashampson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following test instruction number three here should give me a python stacktrace, but it seems to be lost now.

@jhaigh0 jhaigh0 force-pushed the 38405_capture_cpp_stacktraces_from_core_dumps branch from e5d9b18 to 65df9e8 Compare March 24, 2025 13:58
@jhaigh0 jhaigh0 requested a review from thomashampson March 26, 2025 11:26
Copy link
Contributor

@thomashampson thomashampson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified that C++ stack traces are picked up on Linux, and the Python stack traces are still working.

Copy link
Contributor

@jclarkeSTFC jclarkeSTFC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, but could it work on macOS as well?

@jhaigh0
Copy link
Contributor Author

jhaigh0 commented Apr 10, 2025

This is great, but could it work on macOS as well?

I think it could work on mac too. I was focused on linux for the first version because it's what I had to test on + the majority of our users.

If you think it's as simple as replacing the if is _linux with if is_linux or is_macos then I'm happy to throw it in, we'll need to do a round of testing on macs though. Maybe best left for a second pr after we know it's working?

@jclarkeSTFC
Copy link
Contributor

This is great, but could it work on macOS as well?

I think it could work on mac too. I was focused on linux for the first version because it's what I had to test on + the majority of our users.

If you think it's as simple as replacing the if is _linux with if is_linux or is_macos then I'm happy to throw it in, we'll need to do a round of testing on macs though. Maybe best left for a second pr after we know it's working?

Yes I think it'll work just by changing the if statements, but it can wait for a separate PR

@jclarkeSTFC jclarkeSTFC merged commit c579679 into main Apr 10, 2025
10 checks passed
@jclarkeSTFC jclarkeSTFC deleted the 38405_capture_cpp_stacktraces_from_core_dumps branch April 10, 2025 13:52
peterfpeterson pushed a commit to peterfpeterson/mantid that referenced this pull request Apr 15, 2025
…#38600)

* add pystack to workbench and developer environments

* Turn on core dumps on linux for the inner workbench process

Has to switch from multiprocessing to subprocess in order to do this (needed the preexec option).
Through some testing, this should preserve the desired spawn behaviour.
A new module 'workbench_process.py' has been created to use with subprocess.run (can't pass it a function)

* add module to get the compressed output from pystack on the latest worbench core file

* add tests for new run pystack module

* add cpp traces to the report sender

* base64 encode data to store as a string

* show pystack output in the show more details window

* add release note

* update existing tests

* fix pystack test

* add support for lz4 compressed core files, such as those on idaaas

* only import lz4 if on linux

* return Path type as intended

* use process id to identify core dump file

* move release note to 6.13

* pass missing parameter to errorreporter constructor

* only launch with preexec_func when not on windows to avoid crash

* add property to user properties default file

* Update comment in run_pystack

---------

Co-authored-by: thomashampson <thomas.hampson@stfc.ac.uk>
peterfpeterson added a commit that referenced this pull request Apr 15, 2025
* Fix some cppcheck warnings

* Capture c++ stack traces from core dump files on linux (#38600)

* add pystack to workbench and developer environments

* Turn on core dumps on linux for the inner workbench process

Has to switch from multiprocessing to subprocess in order to do this (needed the preexec option).
Through some testing, this should preserve the desired spawn behaviour.
A new module 'workbench_process.py' has been created to use with subprocess.run (can't pass it a function)

* add module to get the compressed output from pystack on the latest worbench core file

* add tests for new run pystack module

* add cpp traces to the report sender

* base64 encode data to store as a string

* show pystack output in the show more details window

* add release note

* update existing tests

* fix pystack test

* add support for lz4 compressed core files, such as those on idaaas

* only import lz4 if on linux

* return Path type as intended

* use process id to identify core dump file

* move release note to 6.13

* pass missing parameter to errorreporter constructor

* only launch with preexec_func when not on windows to avoid crash

* add property to user properties default file

* Update comment in run_pystack

---------

Co-authored-by: thomashampson <thomas.hampson@stfc.ac.uk>

---------

Co-authored-by: James Clarke <james.clarke@stfc.ac.uk>
Co-authored-by: Jonathan Haigh <35813666+jhaigh0@users.noreply.github.com>
Co-authored-by: thomashampson <thomas.hampson@stfc.ac.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic Used for issues and PRs that are managed under the ISIS Epic Workstream Extra Attention Testers and Gate keepers should pay extra attention as this affects core aspects. ISIS: Core Issue and pull requests at ISIS that relate to Core features and functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement to Mantid Error Reporting Analyse core dump files to capture c++ stack traces
4 participants