-
Notifications
You must be signed in to change notification settings - Fork 435
Port gwd_compute_tendencies_from_stress_divergence to CXX #7535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port gwd_compute_tendencies_from_stress_divergence to CXX #7535
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know anything about GW, so I will let other do a more menaingful review.
As for the question of whether a fcn should expect 2d vs 1d views, and whether it should expect Real or Pack data:
- isn't GW a single-col model? If so, why not implementing ALL fcn with 1d args, and then we can wrap the col loop from the eamxx process level (or from another function), leaving the door open for a SK approach like in p3/shoc?
- I don't know how possible this is, but I would encourage using agnostic scalar type as much as possible. I saw a few k+1 deps, so in certain places that would be more delicate, but there are a few tools to compute, e.g., consecutive levs differences, which work with both Real and Pack types.
@bartgol ,
Yes, I think that's what we want to do, but I wanted to confirm it.
That would be nice, but we were not able to do that for p3 or shoc. |
components/eamxx/src/physics/gw/impl/gw_gw_common_init_impl.hpp
Outdated
Show resolved
Hide resolved
components/eamxx/src/physics/gw/impl/gw_gwd_compute_tendencies_from_stress_divergence_impl.hpp
Outdated
Show resolved
Hide resolved
components/eamxx/src/physics/gw/impl/gw_gwd_compute_tendencies_from_stress_divergence_impl.hpp
Show resolved
Hide resolved
components/eamxx/src/physics/gw/impl/gw_gwd_compute_tendencies_from_stress_divergence_impl.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the comments from Luca and Andrew. (My answers to your discussion questions are: Yes.)
I think this looks neat!
components/eamxx/src/physics/gw/impl/gw_gwd_compute_tendencies_from_stress_divergence_impl.hpp
Outdated
Show resolved
Hide resolved
components/eamxx/src/physics/gw/impl/gw_gwd_compute_tendencies_from_stress_divergence_impl.hpp
Outdated
Show resolved
Hide resolved
components/eamxx/src/physics/gw/impl/gw_gwd_compute_tendencies_from_stress_divergence_impl.hpp
Outdated
Show resolved
Hide resolved
Update: I fixed all the suggestions from earlier and I got it working on GPU. There were some complexities that came up:
The only thing worth re-reviewing is the impl.hpp file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides potentially adding a comment on what stuff like -pgwv:pgwv
means (some wave number thing), all's good. I don't know if it I would bother adding comments or just deferring to the physics people specialized in this to add meaningful comments. Doesn't matter either way.
if the ||ism is different from how things were done in F90, which I assume is the case, then I would also mention that in the comments on those pieces of code...
But again, no problem just punting on this for others to do later.
components/eamxx/src/physics/gw/impl/gw_gwd_compute_tendencies_from_stress_divergence_impl.hpp
Outdated
Show resolved
Hide resolved
components/eamxx/src/physics/gw/impl/gw_gwd_compute_tendencies_from_stress_divergence_impl.hpp
Outdated
Show resolved
Hide resolved
components/eamxx/src/physics/gw/impl/gw_gwd_compute_tendencies_from_stress_divergence_impl.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, pending the pgwd/ngwd investigation to see if we can get rid of one.
Based on inspection and feedback from Andrew and Walter, I'm confident we can remove the ngwv concept. I will remove it now. |
Baseline diffs in the gw tests are expected since I re-worked the init data for all the tests. |
int pl_idx = nl_idx + (pgwv - ngwv); // 0-based idx -pgwv:pgwv arrays | ||
int pl_idx = l + pgwv; // 0-based idx for -pgwv:pgwv arrays | ||
|
||
// Force tau at the top of the model to zero, if requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will cause compute-sanitizer to issue an error for the racecheck condition. I'm not sure, I don't know how it handles concurrent writes with the same compite-time number, maybe it recognizes that it's benign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each thread has its own pl_idx, so it should be ok, right?
This conflicts a bit with the EKAT update stuff, so I will wait until that's on master. |
Should hopefully happen on Monday. 🤞 |
The EKAT update PR has been merged. |
Change list: 1) Main change: Fill out CXX impl of gwd_compute_tendencies_from_stress_divergence 2) gw_common_init: Allow it to be called multiple times be deallocating cref and alpha if they had been previously allocated. 3) Begin to fill in constants needed by GW 4) Create struct for containing common init data in C++ 5) Add C++ version of gw_common_init and gw_common_finalize 6) Don't just assume all C++ data is packed. Just have it all unpacked for now 7) Fix randomized test data for gwd_compute_tendencies_from_stress_divergence so that some actual interesting computations happen. 8) For now, have a f90 and cxx bridge available for this function and use the f90 one for baseline generation. Once we are sure we have things working, the f90 calls can be removed.
0f4314a
to
5a2f235
Compare
Change list:
they had been previously allocated.
that some actual interesting computations happen.
one for baseline generation. Once we are sure we have things working, the f90
calls can be removed.
Discussion items:
The BFB unit test is currently passing on CPU, so the f90 and CXX are 100% bfb with each other for now.
The first one of these conversions is always the hardest, so the subsequent ones will hopefully be a lot faster.
Some moderately tricky things we are hitting here that (as far as I can remember) we have not hit before in our conversion efforts.
real(r8), intent(inout) :: tau(ncol,-pgwv:pgwv,0:pver)
where
statements, which become a fair bit less elegant in C++