Skip to content

Conversation

tcclevenger
Copy link
Contributor

@tcclevenger tcclevenger commented Dec 2, 2024

Processes can choose for a tracer to be advected only by dynamics (not by SHOC) by passing turbulence_advected=false to add_tracer(). Guards set up to ensure all tracers requests are consistent among processes.

Moved over from E3SM-Project/scream#3085. Currently working on issue of tracer ordering between PG2 and GLL grids.

[non-BFB for MAMxx]

@tcclevenger tcclevenger added non-BFB PR makes roundoff changes to answers. EAMxx Issues related to EAMxx labels Dec 2, 2024
@tcclevenger tcclevenger self-assigned this Dec 2, 2024
@bartgol
Copy link
Contributor

bartgol commented Dec 2, 2024

@tcclevenger Just to clarify: you said you're still working on ordering of tracers for PG2 and DYN grids, so is this PR not yet ready?

@tcclevenger
Copy link
Contributor Author

@tcclevenger Just to clarify: you said you're still working on ordering of tracers for PG2 and DYN grids, so is this PR not yet ready?

Yes. Do we have a WIP equiv over here so I don't let the AT run while I work on getting a fix in?

@bartgol
Copy link
Contributor

bartgol commented Dec 2, 2024

@tcclevenger Just to clarify: you said you're still working on ordering of tracers for PG2 and DYN grids, so is this PR not yet ready?

Yes. Do we have a WIP equiv over here so I don't let the AT run while I work on getting a fix in?

No we don't. But as soon as you push commits, you can click on the checks tab, and cancel all the jobs (one workflow at a time, unfortunately), if you know you're not done anyways...

Edit: But we could modify our workflows jobs so that, for pull_request events, they only run if the PR is not in draft mode...

@rljacob
Copy link
Member

rljacob commented Dec 2, 2024

What if you change it to draft PR ?

@bartgol
Copy link
Contributor

bartgol commented Dec 2, 2024

What if you change it to draft PR ?

I saw your reply just as I was editing my answer... Unfortunately, gh actions pull_request event still triggers, even if the PR is in draft mode (a debatable design choice, imho). The only way to skip them is to do something like

jobs:
  my-job:
    if: ${{ github.event_name != "pull_request" || github.event.pull_request.draft == "false" }}
    ...

@bartgol
Copy link
Contributor

bartgol commented Dec 2, 2024

@tcclevenger i removed the reviewer request. When you're ready, add them back, so gh pings me. If you'd rather us review right now, even before you fix the PG2-DYN issue, feel free to re-add me/us (I know balwinder is out for a month though).

@rljacob
Copy link
Member

rljacob commented Dec 19, 2024

@tcclevenger what is the status?

@tcclevenger
Copy link
Contributor Author

Still working on a fix for mismatch tracer indices between PG2 and GLL grid

@rljacob
Copy link
Member

rljacob commented Jan 9, 2025

@tcclevenger any progress on fix?

@tcclevenger
Copy link
Contributor Author

@tcclevenger any progress on fix?

Yes, this is what I'm currently working on. Will update as I go.

@rljacob
Copy link
Member

rljacob commented Jan 28, 2025

@tcclevenger please try to make this old PR a priority.

@rljacob rljacob requested a review from ambrad January 28, 2025 19:08
@ambrad
Copy link
Member

ambrad commented Jan 28, 2025

@rljacob is it OK if we switch this to draft?

@rljacob
Copy link
Member

rljacob commented Jan 28, 2025

Sure thats fine.

@rljacob rljacob marked this pull request as draft January 28, 2025 19:26
@ambrad
Copy link
Member

ambrad commented Jan 28, 2025

@tcclevenger would you switch this to Draft if it's not ready? We'll try out that workflow.

@tcclevenger
Copy link
Contributor Author

@tcclevenger please try to make this old PR a priority.

This is what I am currently working on, but the work is in a different branch (it contains a fundamental change to allocating groups of fields that doesn't make sense to throw onto this branch). Yes, I agree draft makes more sense for this PR.

@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/add_tracer_with_turbulence_advection branch 3 times, most recently from 785340d to 8d5f18b Compare March 6, 2025 17:50
@bartgol
Copy link
Contributor

bartgol commented Mar 10, 2025

Seems that some of the other CIME cases are erroring out trying to start the container? Have we seen that elsewhere?

I'm guessing the error is this one?

  toomanyrequests: {"errors":[{"code":"TOOMANYREQUESTS","message":"retry-after: 51.451683ms"}]}

I have never seen it. I suppose we had too many concurrent PRs that needed to be tested, and all the gh-hosted runners tried to access the container registry at the same time? Should be safe to re-run the failed tests manually.

Copy link
Contributor

@singhbalwinder singhbalwinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great from the MAM4xx side! Thanks, Conrad for adding this very important feature!

@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/add_tracer_with_turbulence_advection branch from cb70a4c to 4269e18 Compare March 12, 2025 19:20
Copy link
Member

@ambrad ambrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve as long as the "[non-BFB for MAMxx]" status remains true (I think it is, but just want to be sure).

Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there wasn't much to do in FieldManager? That's nice. With this branch, did you try to print the tracer group content, and verify that the non-turb-advected tracers are all at the end?

I have some thoughts on the design. Not really requests, just discussions sparks.

if (not ekat::contains(req.groups, "tracers")) return;

std::string fname = req.fid.name();
if (tracer_requests.find(fname) == tracer_requests.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is pointless. The else branch should work regardless, no?


// Go through the map entry for each tracer and check that every one
// has the same request for turbulence advection.
for (auto fr : tracer_requests) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say MAM requests field A, specifying that it should not be turb advected. Another process uses field A, so it requests it, but it has no knowledge of what turb advection is, or at least it doesn't care (the choice may be relevant for MAM, but the rest of the atm doesn't care). In this case, wouldn't the two request conflict?

Maybe we should turn this into an enum, like

enum TracerAdvection {
  DynOnly,
  DynAndTurb,
  Default,
};

(the names are up for debate, ofc). In this case, the default would be Default. If all procs use "default", we do both dyn and turb advection, as before. If all procs use "default", but some ask for "DynOnly", we do dyn only. The error only arises if some procs expicitly ask for "DynOnly" and others explicitly ask for "DynAndTurb".

Maybe I'm over thinking it, but there is no way, with a bool, to check if some values for turb_advected_tracers are true b/c the customer asked for true, or simply b/c they didn't care, and relied on the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum may make things complicated, in which case we may also consider whether this check is needed. That is, as long as there is ONE request that says "don't do turb advection", then we don't add it to the turb advected tracers.

Maybe I need to better understand the case we are trying to guard ourselves from...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, using enums in the add_tracer calls make the code more self-explanatory...so maybe switching to an enum is a bit clearer?

add_tracer<Required>("blah",units, grid,1,false);

vs

add_tracer<Required>("blah",units, grid,1,NoTurbAdvection);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be leaning towards getting rid of this check. I think (but will verify) that the changes that evolved in the FM since I wrote this check actually make it a bit redundant? It should error out with inconsistent requests, we just may not get the output about which processes conflict.

With regards to the enums, I may be having a change of heart. I've always avoided in the case where a bool was all that's needed, but these signatures definitely are less clear and add_tracer and add_group are certainly part of the code that users will interface with if they write a process. Will change them back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: I often though "why adding a new type when a bool flag does it?". But code clarity with enums is on a whole other level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up: We need this check given how group requests do group logic, namely, if a single processes says "I want A in group G" then A gets added to G, even if some processes intentionally did not add A to G.

I think this option is more of "I want to explicitly state that A should not be in group G" and then these checks ensure no one added it.

@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/add_tracer_with_turbulence_advection branch from b690428 to 5a4eb82 Compare March 13, 2025 14:26
- AtmosphereProcess::add_tracer() now takes in bool turbulence_advected
  which dictates if shoc advects tracer or ignores
- SHOC requests group "turbulence_advected_tracers", not full "tracers"
  array
- FieldRequest includes a "calling_process" var for error output
This reverts commit 6237c87.

Adds back the enum for bundling
@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/add_tracer_with_turbulence_advection branch from 5a4eb82 to f957603 Compare March 13, 2025 14:32
@tcclevenger
Copy link
Contributor Author

With this branch, did you try to print the tracer group content, and verify that the non-turb-advected tracers are all at the end?

Yes, for PG2, I verified GLL<->PG2 indices matched, and that non-turb-advected tracers came last.

@tcclevenger tcclevenger requested a review from bartgol March 13, 2025 15:37

add_tracer<Updated >("qv", m_phys_grid, kg/kg, N);
add_group<Updated>("tracers",pgn,N, true);
add_group<Updated>("tracers",pgn,N, Bundling::Required);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're bringing the enum back, I may as well ask: do you think the name Bundling is clear? Maybe we could change it into something else. Some ideas:

enum class GroupAllocation {
  Contiguous,
  NotContiguous
};
enum class MonolithicGroup {
  Required,
  NotRequired
};
enum class ContiguousGroup {
  Requred,
  NotRequired
};

I don't know. I wonder if the word "bundle" is a bit obscure, and we can find something more self-explanatory? If you have other name ideas, feel free to bounce them off us. This is the right time to discuss this, before we roll it out again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user potentially needs to know what group.m_bundle is, maybe we would need to think about changing that name too?

I like Monolithic the best. To me non-contiguous implies that "I can access the entire group, it just won't be contiguous" where the only way to access a view of the entire group is bundled case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we decide on a name change, we should change it everywhere...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we used

enum class MonolithicAlloc {
	Required,
	NotRequired
};

? I agree that the word "monolithic" carries a clear meaning. But I also like to have the word "allocation" (or something reminescent of it) somwhere.

What would you then call the bundle? group.m_monolithic? group.m_monolithic_field? group.m_whole? Something else? I like m_monolithic_field, and maybe rename m_fields into m_individual_fields?

Feel free to propose others...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also merge as is, and follow up with a better name convention right after...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like m_monolithic_field and m_individual_fields. I think this is much clearer than "bundling". I've updated all the comments that refer to bundling as well.

@tcclevenger
Copy link
Contributor Author

Confirmed #6981 did not introduce non-BFB fails on PM-GPU, so we can merge this once we settle on a "bundling" name

- Bundling enum -> MonolithicAlloc
- m_bundle field -> m_monolithic_field
- m_fields (in field group) -> m_individual_fields
- All comments updated
@tcclevenger
Copy link
Contributor Author

I decided to update bundling name instead of merging since I'm off work tomorrow and so don't want to merge tonight. Assuming there are no fails/change requests with the naming, I'll merge first thing next week (Sunday night, even, if I remember...).

@singhbalwinder Thanks for your patience with this PR! Ended up taking much more time than anticipated..

@singhbalwinder
Copy link
Contributor

No worries at all—I completely understand that these things can take much longer than expected. I appreciate the time and effort you’ve put into this. Merging next week is fine.

@rljacob
Copy link
Member

rljacob commented Mar 17, 2025

Is this ready?

@tcclevenger
Copy link
Contributor Author

Is this ready?

Yes, going to merge today.

@rljacob
Copy link
Member

rljacob commented Mar 17, 2025

Hang on. I think we want to do the Kokkos branch today.

@rljacob
Copy link
Member

rljacob commented Mar 17, 2025

Kokkos branch delayed so go ahead.

tcclevenger added a commit that referenced this pull request Mar 17, 2025
… into next (PR #6789)

Processes can choose for a tracer to be advected only by dynamics (not by SHOC) by passing
turbulence_advected=false to add_tracer(). Guards set up to ensure all tracers requests are
consistent among processes.

[non-BFB for MAMxx]
@tcclevenger tcclevenger merged commit 1e40ce2 into E3SM-Project:master Mar 17, 2025
13 of 29 checks passed
@tcclevenger tcclevenger deleted the tcclevenger/eamxx/add_tracer_with_turbulence_advection branch March 17, 2025 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EAMxx Issues related to EAMxx non-BFB PR makes roundoff changes to answers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants