-
Notifications
You must be signed in to change notification settings - Fork 435
Fix the abort routines in mpas-framework to use shr_sys_abort when coupled #7457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@matthewhoffman -- I changed the log_abort routine to call mpas_dmpar_global_abort instead of having its own abort logic, but that is not necessary if there's a reason it had been calling mpi routines |
@jonbob , thanks for taking care of this. The changes you implemented are what I was anticipating. I do not remember if there was a reason the log module does its own abort rather than use the pre-existing mpas_dmpar_global_abort routine, but I cannot think of an important one now. One change in behavior is that previously, a critical error sent to the log module would not result in |
Thanks @matthewhoffman. I'm happy to change the behavior of how critical errors get handled in the log module, if you can think of a better approach. Or at least a more meaningful error message in the corresponding log.abort files? |
@jonbob , I don't have a strong opinion on if and how that should be adjusted. Let's see what @xylar says, and I'll add @mark-petersen to the discussion. |
Hmm, I think we almost certainly were not calling My suggestion would be to make the |
@jonbob, that looks good to me. Let me test on the run that was crashing for me to make sure the job ends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the 20250628.GMPAS-JRA1p5-DIB-PISMF.TL319_SOwISC12to30E3r4.chrysalis
that crashed for me as I reported in #7447 (comment).
This time, the jobs seems to have aborted as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonbob , this final adjustment looks like what I expected. In testing with standalone MALI, I got a build error due to leading whitespace in the preprocessor directives that were indented, so I pushed a commit to fix that. After that, I confirmed that MALI compass tests pass and that a situation that leads to an abort from the log module behaves as expected. So everything looks good from the standalone MALI perspective.
(Note that I ran into a compass/scorpio issue in compiling standalone MALI with master using the compass env on compass/main, so I tested this on MALI-Dev/develop after cherry-picking the commits from this PR. Xylar is aware of the compass env issue, but we determined it is not urgent to address.)
Thanks for catching that, @matthewhoffman |
Fix the abort routines in mpas-framework to use shr_sys_abort when coupled Fix the abort routines in mpas-framework so that log_abort calls mpas_dmpar_global_abort and mpas_dmpar_global_abort itself uses shr_sys_abort when running as a coupled model. Fixes #7447 [BFB]
Passes:
merged to next |
merged to master |
Fix the abort routines in mpas-framework so that log_abort calls mpas_dmpar_global_abort and mpas_dmpar_global_abort itself uses shr_sys_abort when running as a coupled model.
Fixes #7447
[BFB]