-
Notifications
You must be signed in to change notification settings - Fork 54
Call MPI_Abort in CIME runs when an exception is thrown inside EAMxx #2984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… EAMxx This ensures that the whole job is terminated. The MPI standard does not guarantee that an uncaught exception will cause the whole MPI job to terminate.
// Execute wrapped function | ||
try { | ||
f(); | ||
} catch (std::exception &e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to change this to catch (...)
for greater generality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but I want to print the exception message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do a rethrow, then add That didn't work, so I'm duplicating a bit the code to cover both needs.catch(...)
underneath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that work? Or do you need to nest try blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, but you might consider nesting try-catch blocks to avoid redundant code:
try {
try {
throw std::exception();
} catch (const std::exception& e) {
printf("hi\n");
throw;
}
} catch (...) {
printf("hello\n");
}
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: bartgol |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Mappy # 5813 FAILED (click to see last 100 lines of console output)
SCREAM_PullRequest_Autotester_Weaver # 6039 PASSED (click to see last 100 lines of console output)
|
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: bartgol |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Mappy # 5815 FAILED (click to see last 100 lines of console output)
SCREAM_PullRequest_Autotester_Weaver # -1 SKIPPED |
The AT is failing b/c of FS issues on mappy. I'm not going to add RETEST, since the issue will likely linger for a while. We can retest/merge next week, it's not urgent. |
@bartgol , is this just waiting for the AT to get back up to full functionality? Could we manually test and then manually merge so that the PR doesn't linger? |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Using Repos:
Pull Request Author: bartgol |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
|
This ensures that the whole job is terminated. The MPI standard does not guarantee that an uncaught exception will cause the whole MPI job to terminate.
Note: this may fix the very long hangs we were sometimes getting when the code crashed, and yet the job did not terminate.
Fixes #2964