Skip to content

Conversation

jgfouca
Copy link
Member

@jgfouca jgfouca commented Jul 18, 2025

Change list:

  1. Main change: Fill out CXX impl of gwd_compute_tendencies_from_stress_divergence
  2. gw_common_init: Allow it to be called multiple times by deallocating cref and alpha if
    they had been previously allocated.
  3. Begin to fill in constants needed by GW
  4. Create struct for containing common init data in C++
  5. Add C++ version of gw_common_init and gw_common_finalize
  6. Don't just assume all C++ data is packed. Just have it all unpacked for now
  7. Fix randomized test data for gwd_compute_tendencies_from_stress_divergence so
    that some actual interesting computations happen.
  8. For now, have a f90 and cxx bridge available for this function and use the f90
    one for baseline generation. Once we are sure we have things working, the f90
    calls can be removed.

Discussion items:

  • The current CXX is an exact 1:1 port of the fortran, which is not what we want long term
  • Like in p3, should we assume these GW functions are called from kernel that is already parallelizing over columns? So, any fortran array like a 3d (ncol, X, Y) becomes a CXX 2d view (X, Y) because we will be passing in subviews for specific columns?
  • Should we assume all vertical column data is packed?
  • Since I wasn't 100% sure about where these GW will sit in the parallel hierarchy, the C++ impl is fully serial right now which will obviously need to be changed.

The BFB unit test is currently passing on CPU, so the f90 and CXX are 100% bfb with each other for now.

The first one of these conversions is always the hardest, so the subsequent ones will hopefully be a lot faster.

Some moderately tricky things we are hitting here that (as far as I can remember) we have not hit before in our conversion efforts.

  • Fancier f90 array indexing, example real(r8), intent(inout) :: tau(ncol,-pgwv:pgwv,0:pver)
  • Heavy use of f90's where statements, which become a fair bit less elegant in C++

@jgfouca jgfouca self-assigned this Jul 18, 2025
@jgfouca jgfouca added EAMxx Issues related to EAMxx wip work in progress labels Jul 18, 2025
Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about GW, so I will let other do a more menaingful review.

As for the question of whether a fcn should expect 2d vs 1d views, and whether it should expect Real or Pack data:

  • isn't GW a single-col model? If so, why not implementing ALL fcn with 1d args, and then we can wrap the col loop from the eamxx process level (or from another function), leaving the door open for a SK approach like in p3/shoc?
  • I don't know how possible this is, but I would encourage using agnostic scalar type as much as possible. I saw a few k+1 deps, so in certain places that would be more delicate, but there are a few tools to compute, e.g., consecutive levs differences, which work with both Real and Pack types.

@jgfouca
Copy link
Member Author

jgfouca commented Jul 21, 2025

@bartgol ,

If so, why not implementing ALL fcn with 1d args, and then we can wrap the col loop from the eamxx process level (or from another function), leaving the door open for a SK approach like in p3/shoc?

Yes, I think that's what we want to do, but I wanted to confirm it.

I don't know how possible this is, but I would encourage using agnostic scalar type as much as possible.

That would be nice, but we were not able to do that for p3 or shoc.

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the comments from Luca and Andrew. (My answers to your discussion questions are: Yes.)

I think this looks neat!

@jgfouca
Copy link
Member Author

jgfouca commented Jul 30, 2025

Update: I fixed all the suggestions from earlier and I got it working on GPU. There were some complexities that came up:

  1. The main k-lev loop has to be serialized because the k+1 computations depend on k.
  2. Due to (1), I changed it so that the parallel loop is over -ngwv:ngwv, since each wave computation is independent of the others.
  3. (2) made it more complicated to get the ubt sum in a repeatable fashion. I ended up storing all the contributions in a work view work(ngwv*2 + 1, plev) and then doing the sum in serial.

The only thing worth re-reviewing is the impl.hpp file.

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides potentially adding a comment on what stuff like -pgwv:pgwv means (some wave number thing), all's good. I don't know if it I would bother adding comments or just deferring to the physics people specialized in this to add meaningful comments. Doesn't matter either way.

if the ||ism is different from how things were done in F90, which I assume is the case, then I would also mention that in the comments on those pieces of code...

But again, no problem just punting on this for others to do later.

Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, pending the pgwd/ngwd investigation to see if we can get rid of one.

@jgfouca
Copy link
Member Author

jgfouca commented Aug 1, 2025

Based on inspection and feedback from Andrew and Walter, I'm confident we can remove the ngwv concept. I will remove it now.

@jgfouca
Copy link
Member Author

jgfouca commented Aug 1, 2025

Baseline diffs in the gw tests are expected since I re-worked the init data for all the tests.

int pl_idx = nl_idx + (pgwv - ngwv); // 0-based idx -pgwv:pgwv arrays
int pl_idx = l + pgwv; // 0-based idx for -pgwv:pgwv arrays

// Force tau at the top of the model to zero, if requested.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will cause compute-sanitizer to issue an error for the racecheck condition. I'm not sure, I don't know how it handles concurrent writes with the same compite-time number, maybe it recognizes that it's benign?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each thread has its own pl_idx, so it should be ok, right?

@jgfouca
Copy link
Member Author

jgfouca commented Aug 1, 2025

This conflicts a bit with the EKAT update stuff, so I will wait until that's on master.

@bartgol
Copy link
Contributor

bartgol commented Aug 1, 2025

This conflicts a bit with the EKAT update stuff, so I will wait until that's on master.

Should hopefully happen on Monday. 🤞

@bartgol
Copy link
Contributor

bartgol commented Aug 4, 2025

This conflicts a bit with the EKAT update stuff, so I will wait until that's on master.

Should hopefully happen on Monday. 🤞

The EKAT update PR has been merged.

jgfouca added 15 commits August 4, 2025 13:01
Change list:
1) Main change: Fill out CXX impl of gwd_compute_tendencies_from_stress_divergence
2) gw_common_init: Allow it to be called multiple times be deallocating cref and alpha if
   they had been previously allocated.
3) Begin to fill in constants needed by GW
4) Create struct for containing common init data in C++
5) Add C++ version of gw_common_init and gw_common_finalize
6) Don't just assume all C++ data is packed. Just have it all unpacked for now
7) Fix randomized test data for gwd_compute_tendencies_from_stress_divergence so
   that some actual interesting computations happen.
8) For now, have a f90 and cxx bridge available for this function and use the f90
   one for baseline generation. Once we are sure we have things working, the f90
   calls can be removed.
@jgfouca jgfouca force-pushed the jgfouca/port_gwd_compute_tendencies_from_stress_divergence branch from 0f4314a to 5a2f235 Compare August 4, 2025 19:06
@jgfouca jgfouca removed the wip work in progress label Aug 4, 2025
@jgfouca jgfouca merged commit 6fff2bb into master Aug 4, 2025
8 of 18 checks passed
@jgfouca jgfouca deleted the jgfouca/port_gwd_compute_tendencies_from_stress_divergence branch August 4, 2025 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EAMxx Issues related to EAMxx
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants