-
Notifications
You must be signed in to change notification settings - Fork 435
EAMxx: enforce a single value for "fill_value" across all eamxx #7537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💌
I am worried about the tracking aspect (but this is unrelated to this PR). My worry is thus: It will be burdensome to declare these things (and what if something is calculated from something else? it gets tricky). Instead, is there a way to scan all entries to calculate the field_mask on-demand instead of manually assigning it? Is there any danger that fill_value will come from something other than a fill op? |
I think if a fill value for field X may show up as part of operation Y (e.g., vremap, field @ p lev, radiation, etc), then I don't see it as a big ask to demand that the developer(s) of Y also put in the correct setup for X that makes downstream customers aware that fill values may be possible. I don't think there are TONS of places where this can happen (at least not now). Of course, the alternative is to always check if the rhs contains fillValue when doing any sort of update for a field, but it seems silly not to use the simpler logic for the vast majority of operations... That said, it is true that if, say, field A is updated by process X using field B, but without doing any fillValue-generating op, A may still contain fillValue entries, b/c of B. Uhm, this may open a big can of worm, which indeed may only be possible by always using a "fill-aware" implementation of the field update/manipulation methods. Ugh, gotta convert to draft and think about this. |
Well your PR is a stepping stone for that. No rush to sort this whole thing out right away; it's better if we take our time thinking through it. Right now, I am attracted to the idea of an on-demand "have_fill_vals" method that dispatches a kernel and returns true/false and a separate "get_field_mask" method that returns the mask. I actually am not super sure we need to get the field_mask entity sorted altogether, because there we can DEMAND that developers impl their arithmetics in fill-aware fashion such that ops must contend with the filled entries by smartly discarding them when/if needed (in IO, we can easily live with another kernel to discard in track-counting, etc., as well). But we can offer get_field_mask as a crutch for developers to use instead of thinking hard about ops |
Ok, for now we can then assume that if a field may contain fill_val, someone should have taken care of marking its meta data attribute accordingly.
I suppose that feature may be useful if one has to to a lot of field update operations, in which case the cost of the extra kernel (to get "have_fill_vals") may get amortized by the lower cost of subsequent update kernels. As for your mask consideration, we can certainly force all operations to be fill-aware, and always check if the rhs contains fill values. If we did that, we'd get rid of quite a bit of code, but at the cost of performing checks at every single entry during Field::update calls. |
d93b6f6
to
abb6047
Compare
xref #7548 |
@bartgol, as part of this PR, could you please ensure we have consistent support for fill val in terms of precision? I think the way to do this would be to edit the underlying DefaultFillValue struct to support Real directly (currently it only supports "float") // Universal fill value for variables
// TODO: When we switch to supporting C++17 we can use a simple `inline constexpr` rather than a struct
template<typename T>
struct DefaultFillValue {
static constexpr bool is_float = std::is_floating_point<T>::value;
static constexpr bool is_int = std::is_integral<T>::value;
static constexpr T value = is_int ? std::numeric_limits<int>::max() / 2 :
is_float ? std::numeric_limits<float>::max() / 1e5 : std::numeric_limits<char>::max();
}; Inside of EAMxx, the DefaultFillValue shouldn't be promoted/demoted in preicision at all, and instead we should have its value of interest throughout. Once we reach the point of the IO op where we write to file, we should then cast to the precision of the netcdf files (usually single, or float). In other words, I do NOT think this is desirable: // Use float, so that if output fp_precision=float, this is a representable value.
// Otherwise, you would get an error from Netcdf, like
// NetCDF: Numeric conversion not representable
// Also, by default, don't pick max float, to avoid any overflow if the value
// is used inside other calculation and/or remap.
float m_fill_value = constants::DefaultFillValue<float>().value; the casting should happen at the point of write, not before it. |
I think the cast should happen in the class, not at the point of write. You need to be able to recognize the same value throughout the library, so it should be generated (in the class) with the same numerical value regardless of precision. If you do that, you are free to promote/demote as many times as you want, and recover always the same value (provided that you don't use it inside numerical expressions, of course). |
60a06e7
to
5e39229
Compare
8ec3584
to
60b6e07
Compare
60b6e07
to
f4a9a47
Compare
ed186eb
to
fd74424
Compare
33fb240
to
1aa6d3e
Compare
if (f_tgt.get_header().has_extra_data("mask_data")) { | ||
auto f_tgt_mask = f_tgt.get_header().get_extra_data<Field>("mask_data"); | ||
auto f_src_mask = f_src.get_header().get_extra_data<Field>("mask_data"); | ||
if (f_tgt.get_header().has_extra_data("mask_field")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this, perhaps, we should transfer ALL extra data from src to target during remap_fwd
...
void transfer_io_str_atts (const scream::Field& src, scream::Field& tgt) { | ||
// Helper lambda, to copy extra data (io string attributes plus filled settings). | ||
// This will be used if any remapper is created, to ensure atts set by atm_procs are not lost | ||
void transfer_extra_data (const scream::Field& src, scream::Field& tgt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: consider moving this to AbstractRemapper::remap_fwd
instead.
I'm not sure it's the right thing to do, as some extra data may be grid-specific. But I feel like there's a few places where we do this, so maybe we can find a way to "structure" extra data. E.g., specify whether extra-data is grid-specific or not? Not sure. We have to think about this.
More expressive of what the mask field represents
@mahf708 this needs a re-review, since more stuff ended up being changed. I think overall this simplifies a few places (e.g., vert remap), but I think it's just the first step toward a more structured handling of mask fields. Perhaps we can chat offline about the big picture, but I think this is a good first step. So to recap, the changes are:
|
Quick question: since we have one value everywhere now, do you think we can get away with the mask field (ones and zeros) altogether? If we really want to, we can easily produce a mask field on the fly knowing that it is a single value that's exceedingly unlikely to be set by some other proc. But maybe you have other explicit uses for the mask in your mind? I think for most use cases, we can just get rid of it... but it may require work in the IO layer to facilitate things. |
The biggest issue is horiz remap, where we remap the mask field to be able to correctly rescale the mapped values later (the horiz interp weights sum to 1, but if you discard entries, you need to rescale by the used weights sum later). It is possible to just do this calculation without the need of a "source" mask field, but it would require some work. We can discuss it, perhaps as a follow up work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skimmed edits and they look good... wanna iterate more on this or are you ready to integrate? wanna hit the next route or just go straight to main?
Do you think we need to go through next? I'm not sure it covers more than CI in this regard, as CI already has eamxx-prod... I think I want to work (maybe chat first) on more in-depth treatment of mask fields. |
We know the fail is unrelated. Merging. |
The user will no longer be able to specify the value for fillValue in the output yaml files. This enables a cascade of simplifications, since we no longer need to pass the value around, as the code is now capable of retrieving fill_value from a central place. Also, add a special extra meta-data interface to FieldHeader to simplify querying whether a field may contain fill value entries. [BFB]
The user will no longer be able to specify the value for fillValue in the output yaml files.
This enables a cascade of simplifications, since we no longer need to pass the value
around, as the code is now capable of retrieving fill_value from a central place.
Also, add a special extra meta-data interface to FieldHeader to simplify querying
whether a field may contain fill value entries.
[BFB]
Hopefully simplifies the path toward addressing #6803. The next step will be to move the creation of mask fields into the diagnostics classes, and have IO simply query whether a field has a
mask_field
metadata set or not (or maybe just querymay_be_filled
), without assuming anything regarding the origin of the field. This will assume that the provider(s) of the field will set themay_be_filled
metadata during init (rather than at runtime), so that IO can figure out whether it needs to enable tracking fill value or not.Note: I need to verify that all the mask logic is correctly transfered during remap ops in IO. I will manually run an ne30 eamxx-prod test to verify things work as expected.