Skip to content

Conversation

jrvb
Copy link

@jrvb jrvb commented Jul 21, 2025

This PR fixes a number of errors in the input interpolation routines under src/chemistry/utils. The details of these along with test output comparing the old routines and the replacements are included here (formatting is a bit wonky, since this is a PDF dump from a colab notebook):

CAM interpolation analysis - Colab.pdf

High level, here is a description of the bugs identified and fixed in this PR:

Horizontal Interpolation

The input routines are designed to perform conservative regridding to maintain the total input levels when translating from the input grid to the model grid. This consists of two main features:

  1. The interpolation accounts for the relative areas of input and model grid cells to put the correct portion of the input data into each model grid cell.
  2. The interpolation routines distinguish between inputs which represent concentrations (eg, H2O mixing ratios) and those which are totals (eg, flight kilometers) and handles these differently to preserve the total input values.

Unfortunately, there were several bugs in the CAM implementation of these features:

  • The flag which determines if an input should be handled as a concentration or a total is set when opening the files, but then overwritten with a default value before the file is processed. This causes all inputs to be treated as concentrations.
  • The MPI code which copies interpolation values to other processing units contains a bug which causes CAM to SEGV when the "treat data as totals" path is enabled.
  • The area weighting code contains an error where it multiplies by a degree difference rather than dividing by it. This has no effect when the input is on a 1-degree grid (it multiplies by 1 rather than dividing by 1) but is drastically changes the input values on other grids (eg, the large grid cell synthetic data in the tests below)
  • The area weighting code misses some data at the boundaries of input grid cells.
  • The area weighting incorrectly handles data which overlaps the prime meridian, leading to concentrated values at lon=0

Vertical Interpolation

The vertical interpolation routines are also designed to preserve the total input data. This works correctly for most parts of the data range, but input data which are beyond the lowest model pressure level are discarded.

This pull request (not intended for submission) includes comments on the original code and ad-hoc unit tests which may be useful in reading through the current PR: #1348

Closes #1348

@cacraigucar
Copy link
Collaborator

cacraigucar commented Jul 22, 2025

Summary of a side discussion:

Question (from @cacraigucar ):
The question was asked if this bug impacts any of the mainline CAM code. The conclusion was that it most likely only impacts the aircraft simulations, is that correct?

Answer (from @andrewgettelman):
Yes. Only contrails (I think). We need to check if anything else was interpolated the same way: I’m not sure because the aircraft emissions were a bit unique.

Answer (from @jrvb):
FWIW, when I originally noticed this issue, I spent some time grepping through the rest of the code and trying to track down use cases. From what I could tell, it looked like the "treat input as a distance" code likely only triggers for aviation inputs, since the file%dist flag is only set to true if the file contains ac_SLANT_DIST or ac_TRACK_DIST data:

https://github.com/ESCOMP/CAM/blob/cam_development/src/chemistry/utils/aircraft_emit.F90#L130

So I would imagine any issues specific to file%dist = .true. should be fairly limited in impact. The more general use of the xy_interp routines (interpreting input as a mixing ratio) go through aircraft_emit.F90 as well. I didn't notice any worrisome calls into this code from elsewhere, so the impact probably depends on how many people are running models configured with aviation inputs of some sort?

Reply (from @andrewgettelman):
Thanks Rob: good news, so doesn’t affect anything other than aviation emissions of water vapor (only used by A few people).

@cacraigucar cacraigucar requested review from chihchen24 and removed request for fvitt and tilmes August 19, 2025 19:47
@cacraigucar cacraigucar self-assigned this Aug 19, 2025
@tilmes
Copy link
Collaborator

tilmes commented Aug 22, 2025

Hi all, I talked to Jack yesterday, and my understanding is that the implementation (if correct) could be easily applied to all external forcings and not just to some that are usually not used. This would be a very nice addition to have, so we don't have to regrid these emissions for new horizontal resolutions. Is there an easy way to explore if this could be applied at least to all external forcings, if the code is corrected? Simone

@cacraigucar
Copy link
Collaborator

Hi all, I talked to Jack yesterday, and my understanding is that the implementation (if correct) could be easily applied to all external forcings and not just to some that are usually not used. This would be a very nice addition to have, so we don't have to regrid these emissions for new horizontal resolutions. Is there an easy way to explore if this could be applied at least to all external forcings, if the code is corrected? Simone

I would encourage us to bring this PR in as-is once it passes review. It is ready to go once someone who is familiar with the code verifies that the changes are appropriate.

I would suggest that a separate issue be opened to explore adding this as a more general regridding and assigned to whoever would be making the code changes to expand it's use in other places in the code. I know that ESMF is used as a general regridder, but that is as far as my knowledge on that capability goes. @nusbaume do you have thoughts?

@nusbaume
Copy link
Collaborator

Hi @cacraigucar @tilmes, I agree that we should bring this PR in as-is once the PR reviews/testing is done, as the sooner we get in these bug fixes the better. Otherwise I am certainly happy to create a new issue to see if someone can expand these fixes to all of the external forcings.

In terms of ESMF, it is a longer-term (e.g. SIMA) goal to have as much regridding or interpolation as possible handled by that library. However I don't think there is any major push for that in CAM7 right now, so if it simpler to just replicate this code for the other external forcing data then that is probably fine. Hope that helps!

@andrewgettelman
Copy link
Collaborator

@chihchen24 did 2 simulations, with the modified code and without.

Attached are a bunch of plots of the ‘ac_FUELBURN’ variable, which should be the input data interpolated to the CAM grid. Great idea to do it this way Jack: so meteorology should not factor into this.

First: zonal means. Slight differences at 2 latitudes, mostly near the surface, there is LOWER fuel burn at cruise with the bug fix (assuming ‘test2’ is the modified code).

zonal_mean zm_cruise

Then, maps: total FUELBURN from 400-150hPa (sum). Global, USA and Europe. Again, the cruise emissions are LOWER with the modified code at two latitudes, mostly over the US and Europe.

global_map europe_map usa_map

If I total up the monthly mean cruise emissions globally over the 400-150hPa altitude range, I get about a -1% difference (-0.66%) for the MODIFIED code.

So I’m a bit confused @jrvb about your 14% increase number. What did we do differently?

@jrvb
Copy link
Author

jrvb commented Sep 5, 2025

Thanks for doing this additional sanity checking @chihchen24 and @andrewgettelman!

In my end-to-end checks, I looked at the ac_SLANT_DIST, which was impacted by both the various geography-based skews and the distances-are-treated-like-mixing-ratios bug. It looks like the output @chihchen24 captured would not capture the effect of the later bug, which probably explains the discrepancy.

For my end-to-end checking, I instrumented ssatcontrail.F90 to output the values it gets for ac_SLANT_DIST, so this should capture all of the transformations done in the input layer plus anything else that might happen along the way to the contrail code. I looked just at a single timestamp from Jan 1, and found that the old code undercounted that particular ac_SLANT_DIST input by ~14% while the updated code got the right values to within floating point roundoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants