Skip to content

Conversation

bartgol
Copy link
Contributor

@bartgol bartgol commented Jul 21, 2025

The user will no longer be able to specify the value for fillValue in the output yaml files.
This enables a cascade of simplifications, since we no longer need to pass the value
around, as the code is now capable of retrieving fill_value from a central place.

Also, add a special extra meta-data interface to FieldHeader to simplify querying
whether a field may contain fill value entries.

[BFB]


Hopefully simplifies the path toward addressing #6803. The next step will be to move the creation of mask fields into the diagnostics classes, and have IO simply query whether a field has a mask_field metadata set or not (or maybe just query may_be_filled), without assuming anything regarding the origin of the field. This will assume that the provider(s) of the field will set the may_be_filled metadata during init (rather than at runtime), so that IO can figure out whether it needs to enable tracking fill value or not.

Note: I need to verify that all the mask logic is correctly transfered during remap ops in IO. I will manually run an ne30 eamxx-prod test to verify things work as expected.

@bartgol bartgol requested a review from mahf708 July 21, 2025 17:34
@bartgol bartgol self-assigned this Jul 21, 2025
@bartgol bartgol added BFB PR leaves answers BFB EAMxx Issues related to EAMxx labels Jul 21, 2025
Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💌

@mahf708
Copy link
Contributor

mahf708 commented Jul 21, 2025

I am worried about the tracking aspect (but this is unrelated to this PR). My worry is thus:

It will be burdensome to declare these things (and what if something is calculated from something else? it gets tricky). Instead, is there a way to scan all entries to calculate the field_mask on-demand instead of manually assigning it? Is there any danger that fill_value will come from something other than a fill op?

@bartgol
Copy link
Contributor Author

bartgol commented Jul 21, 2025

I am worried about the tracking aspect (but this is unrelated to this PR). My worry is thus:

It will be burdensome to declare these things (and what if something is calculated from something else? it gets tricky). Instead, is there a way to scan all entries to calculate the field_mask on-demand instead of manually assigning it? Is there any danger that fill_value will come from something other than a fill op?

I think if a fill value for field X may show up as part of operation Y (e.g., vremap, field @ p lev, radiation, etc), then I don't see it as a big ask to demand that the developer(s) of Y also put in the correct setup for X that makes downstream customers aware that fill values may be possible. I don't think there are TONS of places where this can happen (at least not now). Of course, the alternative is to always check if the rhs contains fillValue when doing any sort of update for a field, but it seems silly not to use the simpler logic for the vast majority of operations...

That said, it is true that if, say, field A is updated by process X using field B, but without doing any fillValue-generating op, A may still contain fillValue entries, b/c of B. Uhm, this may open a big can of worm, which indeed may only be possible by always using a "fill-aware" implementation of the field update/manipulation methods. Ugh, gotta convert to draft and think about this.

@bartgol bartgol marked this pull request as draft July 21, 2025 20:44
@mahf708
Copy link
Contributor

mahf708 commented Jul 21, 2025

Well your PR is a stepping stone for that.

No rush to sort this whole thing out right away; it's better if we take our time thinking through it.

Right now, I am attracted to the idea of an on-demand "have_fill_vals" method that dispatches a kernel and returns true/false and a separate "get_field_mask" method that returns the mask. I actually am not super sure we need to get the field_mask entity sorted altogether, because there we can DEMAND that developers impl their arithmetics in fill-aware fashion such that ops must contend with the filled entries by smartly discarding them when/if needed (in IO, we can easily live with another kernel to discard in track-counting, etc., as well). But we can offer get_field_mask as a crutch for developers to use instead of thinking hard about ops

@bartgol
Copy link
Contributor Author

bartgol commented Jul 21, 2025

Well your PR is a stepping stone for that.

No rush to sort this whole thing out right away; it's better if we take our time thinking through it.

Ok, for now we can then assume that if a field may contain fill_val, someone should have taken care of marking its meta data attribute accordingly.

Right now, I am attracted to the idea of an on-demand "have_fill_vals" method that dispatches a kernel and returns true/false and a separate "get_field_mask" method that returns the mask. I actually am not super sure we need to get the field_mask entity sorted altogether, because there we can DEMAND that developers impl their arithmetics in fill-aware fashion such that ops must contend with the filled entries by smartly discarding them when/if needed (in IO, we can easily live with another kernel to discard in track-counting, etc., as well). But we can offer get_field_mask as a crutch for developers to use instead of thinking hard about ops

I suppose that feature may be useful if one has to to a lot of field update operations, in which case the cost of the extra kernel (to get "have_fill_vals") may get amortized by the lower cost of subsequent update kernels. As for your mask consideration, we can certainly force all operations to be fill-aware, and always check if the rhs contains fill values. If we did that, we'd get rid of quite a bit of code, but at the cost of performing checks at every single entry during Field::update calls.

@bartgol bartgol force-pushed the bartgol/eamxx/fill-val-handling branch from d93b6f6 to abb6047 Compare July 21, 2025 22:57
@mahf708
Copy link
Contributor

mahf708 commented Jul 25, 2025

xref #7548

@mahf708
Copy link
Contributor

mahf708 commented Jul 26, 2025

@bartgol, as part of this PR, could you please ensure we have consistent support for fill val in terms of precision? I think the way to do this would be to edit the underlying DefaultFillValue struct to support Real directly (currently it only supports "float")

// Universal fill value for variables
// TODO: When we switch to supporting C++17 we can use a simple `inline constexpr` rather than a struct
template<typename T>
struct DefaultFillValue {
  static constexpr bool is_float = std::is_floating_point<T>::value;
  static constexpr bool is_int   = std::is_integral<T>::value;
  static constexpr T value = is_int ? std::numeric_limits<int>::max() / 2 :
	  is_float ? std::numeric_limits<float>::max() / 1e5 : std::numeric_limits<char>::max();

};

Inside of EAMxx, the DefaultFillValue shouldn't be promoted/demoted in preicision at all, and instead we should have its value of interest throughout. Once we reach the point of the IO op where we write to file, we should then cast to the precision of the netcdf files (usually single, or float). In other words, I do NOT think this is desirable:

  // Use float, so that if output fp_precision=float, this is a representable value.
  // Otherwise, you would get an error from Netcdf, like
  //   NetCDF: Numeric conversion not representable
  // Also, by default, don't pick max float, to avoid any overflow if the value
  // is used inside other calculation and/or remap.
  float m_fill_value = constants::DefaultFillValue<float>().value;

the casting should happen at the point of write, not before it.

@bartgol
Copy link
Contributor Author

bartgol commented Jul 28, 2025

@bartgol, as part of this PR, could you please ensure we have consistent support for fill val in terms of precision? I think the way to do this would be to edit the underlying DefaultFillValue struct to support Real directly (currently it only supports "float")

// Universal fill value for variables
// TODO: When we switch to supporting C++17 we can use a simple `inline constexpr` rather than a struct
template<typename T>
struct DefaultFillValue {
  static constexpr bool is_float = std::is_floating_point<T>::value;
  static constexpr bool is_int   = std::is_integral<T>::value;
  static constexpr T value = is_int ? std::numeric_limits<int>::max() / 2 :
	  is_float ? std::numeric_limits<float>::max() / 1e5 : std::numeric_limits<char>::max();

};

DefaultFIllValue supports both double and float. The class was designed to always use a float number, so that we can detect fill value entries (e.g. in an input file) regardless of what precision we are using in the current run and what precision we were using in the run that generated the file. The type of value is whatever T is (int, char, double, or float). Its numerical value for fp types will always be the same (since it fits inside a float`).

Inside of EAMxx, the DefaultFillValue shouldn't be promoted/demoted in preicision at all, and instead we should have its value of interest throughout. Once we reach the point of the IO op where we write to file, we should then cast to the precision of the netcdf files (usually single, or float). In other words, I do NOT think this is desirable:

  // Use float, so that if output fp_precision=float, this is a representable value.
  // Otherwise, you would get an error from Netcdf, like
  //   NetCDF: Numeric conversion not representable
  // Also, by default, don't pick max float, to avoid any overflow if the value
  // is used inside other calculation and/or remap.
  float m_fill_value = constants::DefaultFillValue<float>().value;

the casting should happen at the point of write, not before it.

I think the cast should happen in the class, not at the point of write. You need to be able to recognize the same value throughout the library, so it should be generated (in the class) with the same numerical value regardless of precision. If you do that, you are free to promote/demote as many times as you want, and recover always the same value (provided that you don't use it inside numerical expressions, of course).

@bartgol bartgol force-pushed the bartgol/eamxx/fill-val-handling branch 2 times, most recently from 60a06e7 to 5e39229 Compare July 30, 2025 23:11
@bartgol bartgol force-pushed the bartgol/eamxx/fill-val-handling branch 2 times, most recently from 8ec3584 to 60b6e07 Compare August 11, 2025 22:39
@bartgol bartgol force-pushed the bartgol/eamxx/fill-val-handling branch from 60b6e07 to f4a9a47 Compare August 21, 2025 00:57
@bartgol bartgol force-pushed the bartgol/eamxx/fill-val-handling branch from ed186eb to fd74424 Compare August 21, 2025 19:28
@bartgol bartgol force-pushed the bartgol/eamxx/fill-val-handling branch from 33fb240 to 1aa6d3e Compare August 21, 2025 20:15
@bartgol bartgol requested a review from mahf708 August 23, 2025 02:19
if (f_tgt.get_header().has_extra_data("mask_data")) {
auto f_tgt_mask = f_tgt.get_header().get_extra_data<Field>("mask_data");
auto f_src_mask = f_src.get_header().get_extra_data<Field>("mask_data");
if (f_tgt.get_header().has_extra_data("mask_field")) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, perhaps, we should transfer ALL extra data from src to target during remap_fwd...

void transfer_io_str_atts (const scream::Field& src, scream::Field& tgt) {
// Helper lambda, to copy extra data (io string attributes plus filled settings).
// This will be used if any remapper is created, to ensure atts set by atm_procs are not lost
void transfer_extra_data (const scream::Field& src, scream::Field& tgt) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: consider moving this to AbstractRemapper::remap_fwd instead.

I'm not sure it's the right thing to do, as some extra data may be grid-specific. But I feel like there's a few places where we do this, so maybe we can find a way to "structure" extra data. E.g., specify whether extra-data is grid-specific or not? Not sure. We have to think about this.

More expressive of what the mask field represents
@bartgol
Copy link
Contributor Author

bartgol commented Aug 23, 2025

@mahf708 this needs a re-review, since more stuff ended up being changed.

I think overall this simplifies a few places (e.g., vert remap), but I think it's just the first step toward a more structured handling of mask fields. Perhaps we can chat offline about the big picture, but I think this is a good first step.

So to recap, the changes are:

  • Always use a compile-time value for the fill value (no configurability, but down the road we can think about making this a runtime value, which can be set at initialization).
  • Unify combine and fill_aware_combine into one function, with an extra template arg. While this doesn't change the substance, it makes calling the function easier (e.g., see CombineViewsHelper implementation)
  • Remove all code that set mask_value in a Field extra data. Since there is ONE possible value, this is pointless now.
  • Since we don't have mask_value to check if a field can have fill values, added may_be_filled() method to FieldHeader to check if we foresee the field containing fill values.
  • Simplified VerticalRemapper. Instead of remapping the mask field (using 0 as a mask value), simply set mask=1 at the start of remap_fwd, and set entries to 0 as we extrapolate. No need to run remap on masks.

@mahf708
Copy link
Contributor

mahf708 commented Aug 23, 2025

  • Since we don't have mask_value to check if a field can have fill values, added may_be_filled() method to FieldHeader to check if we foresee the field containing fill values.

Quick question: since we have one value everywhere now, do you think we can get away with the mask field (ones and zeros) altogether? If we really want to, we can easily produce a mask field on the fly knowing that it is a single value that's exceedingly unlikely to be set by some other proc. But maybe you have other explicit uses for the mask in your mind? I think for most use cases, we can just get rid of it... but it may require work in the IO layer to facilitate things.

@bartgol
Copy link
Contributor Author

bartgol commented Aug 26, 2025

  • Since we don't have mask_value to check if a field can have fill values, added may_be_filled() method to FieldHeader to check if we foresee the field containing fill values.

Quick question: since we have one value everywhere now, do you think we can get away with the mask field (ones and zeros) altogether? If we really want to, we can easily produce a mask field on the fly knowing that it is a single value that's exceedingly unlikely to be set by some other proc. But maybe you have other explicit uses for the mask in your mind? I think for most use cases, we can just get rid of it... but it may require work in the IO layer to facilitate things.

The biggest issue is horiz remap, where we remap the mask field to be able to correctly rescale the mapped values later (the horiz interp weights sum to 1, but if you discard entries, you need to rescale by the used weights sum later). It is possible to just do this calculation without the need of a "source" mask field, but it would require some work. We can discuss it, perhaps as a follow up work.

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skimmed edits and they look good... wanna iterate more on this or are you ready to integrate? wanna hit the next route or just go straight to main?

@bartgol
Copy link
Contributor Author

bartgol commented Aug 27, 2025

Do you think we need to go through next? I'm not sure it covers more than CI in this regard, as CI already has eamxx-prod...

I think I want to work (maybe chat first) on more in-depth treatment of mask fields.

@bartgol bartgol marked this pull request as ready for review August 27, 2025 13:33
@bartgol
Copy link
Contributor Author

bartgol commented Aug 27, 2025

We know the fail is unrelated. Merging.

bartgol added a commit that referenced this pull request Aug 27, 2025
The user will no longer be able to specify the value for fillValue in the output yaml files.
This enables a cascade of simplifications, since we no longer need to pass the value
around, as the code is now capable of retrieving fill_value from a central place.

Also, add a special extra meta-data interface to FieldHeader to simplify querying
whether a field may contain fill value entries.

[BFB]
@bartgol bartgol merged commit f5e25e8 into master Aug 27, 2025
30 of 32 checks passed
@bartgol bartgol deleted the bartgol/eamxx/fill-val-handling branch August 27, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB EAMxx Issues related to EAMxx
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants