-
Notifications
You must be signed in to change notification settings - Fork 435
Add SCREAMv1 test to e3sm_gpucxx suite #5495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested on Summit using both cpu and gpu, and it works fine, thanks.
with latest master on Summit runs into bld errors:
On Crusher with
Does this look familiar in runs of this case with SCREAM repo? |
Thanks @amametjanov this actually passes for me on the SCREAM repo, and confirmed it fails with a local master merge into E3SM on summit. I'll try to figure out what's going on with the master merge. |
Okay, I see what happened. It looks like #5481 changed some things in hommexx that haven't been fixed upstream in E3SM yet. The fixes are on the SCREAM repo, so I will need to open another PR to pull those into E3SM I think, or pull those into this PR (looks like the diffs only affect sources in the eamxx directory, plus shoc). |
979b436
to
e2caf84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should not be 356 commits for such a simple addition.
@rljacob I had to pull in a scream->e3sm merge because there were conflicts between the two repos since this PR was opened that prevented SCREAMv1 from building on the E3SM repo. I can move this to a separate PR, or rename this one to reflect this. Was waiting on my tests to update this PR (which just passed). |
Please make a separate PR. |
If you need to resync SCREAM with E3SM and its 300+ commits, that should be its own PR. |
9d34608
to
5111a4f
Compare
This requires #5582 to fix SCREAMv1 build errors. |
Done. My reasoning here was to consider this PR the upstream merge, and as a side effect add the test to make sure it doesn’t break again, rather than considering the upstream merge the effect. In any event, #5582 is opened to bring EAMxx up to date, then this PR will just add the test. |
(PR #5495) Add SCREAMv1 test to e3sm_gpucxx suite Add F2010-SCREAMv1 test to e3sm_gpucxx suite to get test coverage on GPU for EAMxx codebase in main E3SM repo. [BFB]
Do we need to add machine-specific cmake in
Similar error on Crusher: https://my.cdash.org/viewTest.php?buildid=2341352 Works on Ascent: https://my.cdash.org/viewTest.php?buildid=2341401 |
notes: needs some syncing and restesting. @brhillman will do. |
5111a4f
to
13d1962
Compare
@brhillman is this ready now? |
If I copy
|
Just FYI. Essentially, looking for non-existent macro files (Linux.cmake, crusher.cmake and crayclanggpu_Linux.cmake).
|
@rljacob , there are a few ways. This new test in the suite |
Lets do the second option but wait until pm comes back online. |
Note that when I try this same test on muller-gpu (generally same as pm-gpu), I get an error and a hang. We would not expect any ne4 test to take much time, so maybe don't increase the walltime for the test.
Checking out scream instead (again on muller-gpu) and these tests work fine. So must be something we are doing differently. |
Should I merge to master or hold off? |
@ndkeen is the one SCREAM test in this suite still running out time? |
If I recall, I think we found some issue with the test hanging. When I tried it just now, I get a build error:
|
@bartgol since Jim is on vacation, could you take a look at the above build error on pm-gpu? |
It seems like the ekat submodule is not recent enough. However, it is recent enough in e3sm/master. Perhaps the branch is old, and is forcing a roll back on the ekat submodule? I'll try merging into next locally now, to see what happens to the submodule. |
Oh, wait, this has the new ekat, but an old EAMxx. EAMxx is still using code from an old ekat. We need to update eamxx first. Perhaps the cleanest thing is to remove the branch from next, hard reset the branch to track an up-to-date eamxx, and then remerge to next. |
I'd rather have an eamxx update in a separate PR. Then when that is on next, this should work. |
That's fine with me. And now, let's draw straws! :) Jokes aside, how soon do we want this to be integrated? |
I'm really sick of seeing this open PR so an eamxx downstream merge soon would be good. But I think there is an upstream still pending over in scream. @jgfouca can you push that along? |
@rljacob the downstream merge in the scream repo should move along today. There was a failing test due to a bad auto-resolution of a conflict. Should be fixed now, and tests should be passing on the next AT round (mappy is a bit backed up due to it being offline for a few days). |
@zyuying can you look at the build error from your COSP changes when building scream on pm-gpu? |
13d1962
to
ec98339
Compare
|
@rljacob this should now fix the SCREAMv1 build error associated with the COSP updates. |
It passed! on pm-gpu |
Add F2010-SCREAMv1 test to e3sm_gpucxx suite to get test coverage on GPU for
EAMxx codebase in main E3SM repo.
Fixes #6540.
[BFB]