-
Notifications
You must be signed in to change notification settings - Fork 435
EAMxx: Adds diagnostic output for *_sfgaex1, *_sfgaex2, *_sfnnuc, *_sfcoag #7492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
add_field<Computed>("mam4_microphys_tendency_renaming", vector3d_num_gas_aerosol_constituents, pow(mol, -1), grid_name); | ||
add_field<Computed>("mam4_microphys_tendency_nucleation", vector3d_num_gas_aerosol_constituents, pow(mol, -1), grid_name); | ||
add_field<Computed>("mam4_microphys_tendency_coagulation", vector3d_num_gas_aerosol_constituents, pow(mol, -1), grid_name); | ||
add_field<Computed>("mam4_microphys_tendency_renaming_cw", vector3d_num_gas_aerosol_constituents, pow(mol, -1), grid_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need mam4_microphys_tendency_renaming_cw
? I think we only need the interstitial tendencies. Could you also please verify the units? I think the units should be mol/mol
or kmol/kmol
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea on whether we need the _cw
quantity. The confluence page indicates that *_sfgaex2
is desired, and it's my understanding that this is calculated for the cloudwater quantities, as in the variable qqcwgcm_tendaa
in mam4_amicphys.hpp
. That was based on this bit I found in the fortran code
modal_aero/modal_aero_amicphys.F90:223
character(len=8) :: suffix_q_coltendaa(nqtendaa) = &
(/ '_sfgaex1', '_sfgaex2', '_sfnnuc1', '_sfcoag1' /)
character(len=8) :: suffix_qqcw_coltendaa(nqqcwtendaa) = '_sfgaex2'
That said, I wanted to include it, just in case, so I could delete rather than have to add code later. Happy to get rid of it if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the units, it was unclear to me whether the units vary based on which proccess provides the quantity--e.g., one unit for coagulation and another for rename OR units vary based on species, in which case there's not much we can do unless we want to split the diagnostic arrays into even smaller species-level variables.
And this is just based on modal_aero_amicphys.F90:423
// qgcmN and qqcwgcmN (N=1:4) are grid-cell mean tracer mixing ratios
// (TMRs, mol/mol or #/kmol)
// N=1 - before gas-phase chemistry
// N=2 - before cloud chemistry
// N=3 - incoming values (before gas-aerosol exchange, newnuc, coag)
// N=4 - outgoing values (after gas-aerosol exchange, newnuc, coag)
// qgcm (ncnst) : grid cell mean, interstitial constituents (unit does not
// matter)
// qqcwgcm(ncnst): grid cell mean, cloud-borne constituents (unit
// does not matter)
In chatting with @mahf708 and @TaufiqHassan, it seemed like the best option would be to got with mol^-1
and account for the numerator and/or factor of 1e3 in post-processing.
I'm certainly open to other solutions, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right on both points. Let's keep the _cw
species.
Units depend on the species (# mixing ratio vs. volume mixing ratio). We can keep the units as mol^-1. This array is intended to be used to extract individual species, and there we can be more specific about the units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. That makes sense to me, then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the field is a "collection" of subfields with different units (much like a FieldGroup), you can also set units to nondimensional and attach an "invalid" string to it. That way, in the output file you will clearly see that the units of this field should NOT be considered.
Once you get the field from the AD, you can also attach additional (and arbitrary) meta data to the field that is intended for IO, like a string explaining this. E.g., in initialize_impl:
using stratts_t = std::map<std::string,std::string>;
auto& f = get_field_out("mam4_microphys_tendency_renaming_cw");
auto& io_stratts = f.get_header().get_extra_data<stratts_t>("io: string attributes");
io_stratts["doc"] = "This is a 'collection' of eterogeneous fields. Each species has different units";
IO will take care to transfer this to the file (even if there are layers of remapping in the IO sequence).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @mjschmidt271: Could you please add this for the units of these diagnostics variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Mike!
I am not sure why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have spelled out "cw" more verbosely, but that's very minor. Looks good!
We can replace "cw" with "cloud_borne". |
yeah, see, to me, "cloud-water" (I think that's what cw is for, right?) and "cloud-borne" aren't interchangeable, so I am often confused when I see "cloud water" --- I think of cloud water as an actual concrete thing that's wholly separate. Anything can be cloud borne (or in-cloud), but only one thing can be "cloud water" ... |
components/eamxx/tests/single-process/mam/aero_microphys/output.yaml
Outdated
Show resolved
Hide resolved
8394394
to
ceafe76
Compare
I looked into the lines that are causing the diffs in the CUDA-optimized build. If we comment out this block, the code is BFB w.r.t the baselines. Although this block is never actually executed (all the if conditions evaluate to false), its presence seems to affect results, likely due to compiler optimizations (e.g., -O3). The RMS differences are very small. I plan to revisit this code in the future to replace the raw arrays with Kokkos views. Given that the optimized build for CPU/OpenMP is BFB and all other tests (optimized and release) are BFB, I think we should go ahead and merge it. We have seen similar behavior in the past, where optimized builds for some compilers are NBFB even for CPU builds, while the debug builds are BFB. |
What happens if you lower optimization to -O2 and leave the block in? |
I was also curious about it, but it will require new baselines, as baselines should also be at -O2. I can run this experiment and report back. |
@bartgol @tcclevenger any advice on above? |
@singhbalwinder did you try to put some print statements to verify that those code blocks are indeed never executed? |
Yes, I put the prints in and outside of that block to verify that the code in that block is never executed in this test. If we uncomment just a single if condition, I see the diffs in
If I then comment out the assignment line ( I see these diffs with O3, O2, EDIT: My O1 tests were using O3, so I am redoing my O1 test. |
@singhbalwinder , This seems like a bug that Oscar and then I worked on a few months ago. In that case it was the array allocation itself that caused problems. When the use of the array was commented out the diff went away because the CUDA compiler would not allocate an array when it was not used in a significant way. I could see the allocation because the size of the executable changed by the byte size of the array. Never figured it out. Oscar changed some code for another reason and the diff went away. When you replace the fixed sized arrays with Kokkos views in a later merge request, it will fix itself. |
That's quite mysterious. Perhaps when you comment the code, the compiler is able to detect that the static arrays are not needed, and does not create them. When uncommented, the compiler cannot do such assumption, and allocates them. These arrays may end up causing a large kernel memory usage, as they are allocated per thread. This may somehow hint at some race condition on some scratch memory, something that may be hard to track down. Out of curiosity, are the results BFB across consecutive runs? I think in general we should avoid to allocate non-trivial arrays on a per-thread basis, as it's done here. I don't know what gas_pcnst is, but I assume it's ~40? Having 5 of these arrays means ~200 doubles per thread, which is 1.6KB. This can contribute to limit the size of a block on CUDA, as the amount of shared memory is limited (e.g., A100 has 164KB / SM, so it could only accommodate ~100 threads/block, 10x less than the max allowed, and that's only counting these 5 arrays). Edit: sorry, I mixed up local and shared memory. But the bottom line still holds: we need to be careful with how much stack memory we request on a per-thread basis. |
I repeated this test many times, and the results are BFB reproducible. So, this change is not causing any nondeterminism. |
void get_field_set_io_docstring(const std::string &fname) { | ||
using str_atts_t = std::map<std::string,std::string>; | ||
auto &f = get_field_out(fname); | ||
auto &io_str_atts = f.get_header().get_extra_data<str_atts_t>("io: string attributes"); | ||
io_str_atts["doc"] = "Collection of heterogeneous quantities--species have different units"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add the docs of these as long_name attributes following https://docs.e3sm.org/E3SM/EAMxx/user/io_metadata/ instead of the manual string attires.
@bartgol I think the long name is more appropriate for this, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the io_str_atts["doc"]
be the long name for these diagnostics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think adding these docs is important, but if you are going to do it, you might as well do it properly following the link above. The long name is meant to be descriptive of the quantity, including caveats about it. So, I would add a a few-word description of each quantity, then add the caveat you're adding here.
If Luca says otherwise, that's fine. The worry is, without a central location, these docs strings may become unwiedly... and that's precisely why we made the io_metadata workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can modify this to include the "long name". I will wait for @bartgol to chime in before making this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought long name was only supposed to be a descriptive name, so I thought adding an extra metadata attribute for "notes" was ok. If you guys think long name is the place to include also caveats/notes and other miscellanea info, that's fine with me.
3393b59
to
6340b01
Compare
components/eamxx/src/physics/mam/eamxx_mam_microphysics_process_interface.cpp
Outdated
Show resolved
Hide resolved
// Diagnostics: tendencies due to gas phase chemistry [mixed units: kg/kg/s or #/kg/s] | ||
add_field<Computed>("mam4_microphysics_tendency_gas_phase_chemistry", vector3d_num_gas_aerosol_constituents, nondim, grid_name); | ||
|
||
// Diagnostics: tendencies due to aqueous chemistry [mixed units: kg/kg/s or #/kg/s] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is basically perfect for a long name (you just need to add "mam4 microphys" or something to the start and it will be 🍑
We have addressed all the reviewers' comments, and this PR is ready to merge. We have one test failing. I think we can still merge as the diff is very small and it is just in the optimized build. All other optimized and debug tests are BFB. |
void add_io_docstring_to_fields_with_mixed_units(const std::map<std::string, std::string> &flds) { | ||
using str_atts_t = std::map<std::string,std::string>; | ||
for (const auto &pair : flds) { | ||
// Get the field, and add a docstring to its string attributes | ||
// This is used to document that the field contains heterogeneous | ||
// quantities, i.e., species have different units. | ||
auto &f = get_field_out(pair.first); | ||
auto &io_str_atts = f.get_header().get_extra_data<str_atts_t>("io: string attributes"); | ||
io_str_atts["doc"] = pair.second; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this is not a good idea. This will be forgotten and having a non-standard "doc" entry in the netcdf files isn't going to be all that illuminating or helpful, frankly. I suggest two things:
- For now: Add these strings (you can keep them exactly as they are) to the io_metadata csv file in the link I shared under long_name column
- For later: Add a src/diagnostic impl to parse out these fields into the more regular fields (col, lev) instead of the non-obvious fields (col, something_no_user_will_be_ever_able_to_figure_out_readily, lev)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, these fields are there only to expose these quantities. Ideally, we should have fields with names that make sense (e.g., so4_mam4_gas_phase_tendency
) and extract specific tracers/species from these fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But up to you if you wanna do this right away, or merge and do later. I am not picky about these doc strings (so I am happy to pretend they are not there ;))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to worry about the roundoff diff, I'd say merge and monitor for a few days to ensure we are not hitting non-determinism.
EAMxx: Adds diagnostic output for *_sfgaex1, *_sfgaex2, *_sfnnuc, *_sfcoag Adds diagnostic output for more tendencies coming out of Microphysics. This adds some tendencies related to gas-aerosol exchange to the MicrophysDiagnosticArrays struct. NOTE Each of these quantities is a 3D field and is not column-integrated (as we see here) or scaled for unit changes (as is done here). If the user wishes to integrate the quantities over a column, they can request the relevant field (e.g., in output.yaml for a standalone test) with the suffix _vert_sum_dp_weighted appended (see EAMxx docs for more details). That is, to get the 3d field, request mam4_microphys_tendency_nucleation, and if you prefer a column-integrated version, weighted by pseudo_density, then request mam4_microphys_tendency_nucleation_vert_sum_dp_weighted. [NBFB] for only one optimized standalone test on CUDA, BFB otherwise * mjs/eamxx/gas-tend-diags: Adds long name and message about mixed units Update MAM4xx submodule to include view2d definition Clang for MAM4xx codes only Use 2D to fetch diagnostics rather than subviewing Adds mam4xx updates A working version for only one diag-needs cleanup microphys -> microphysics Working version of io docstring Code may not build - Moves the io doc string call to initialize impl add metadata strings to diag fields and units Update MAM4xx: changes for to parallel_for Removes arrays and use views directly in MAM4xx submodule disable output of new diagnostics switch _cw to _cloud_borne rework diagnostic outputs
Adds diagnostic output for more tendencies coming out of Microphysics.
This adds some tendencies related to gas-aerosol exchange to the
MicrophysDiagnosticArrays
struct.NOTE
Each of these quantities is a 3D field and is not column-integrated (as we see here) or scaled for unit changes (as is done here).
If the user wishes to integrate the quantities over a column, they can request the relevant field (e.g., in
output.yaml
for a standalone test) with the suffix_vert_sum_dp_weighted
appended (see EAMxx docs for more details). That is, to get the 3d field, requestmam4_microphys_tendency_nucleation
, and if you prefer a column-integrated version, weighted bypseudo_density
, then requestmam4_microphys_tendency_nucleation_vert_sum_dp_weighted
.[NBFB] for only one optimized standalone test on CUDA, BFB otherwise