-
Notifications
You must be signed in to change notification settings - Fork 435
EAMxx: Update linoz and oxid files in XML #7638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates MAM4xx chemistry input files by replacing multi-year linoz and oxid data files with single-year versions to resolve NBFB restart issues in MAM4xx simulations.
- Replace multi-year linoz files (1850-2015) with single-year 2010 versions
- Replace multi-year oxid files (1850-2015) with single-year 2015 versions
- Update file naming conventions to reflect the new temporal scope
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
<mam4_linoz_file_name type="file" doc="LINOZ chemistry file"> ${DIN_LOC_ROOT}/atm/scream/mam4xx/linoz/ne30pg2/linoz2010_2010JPL_CMIP6_10deg_58km_ne30pg2_c20250826.nc</mam4_linoz_file_name> | ||
<mam4_linoz_file_name hgrid="ne4np4.pg2" type="file" doc="LINOZ chemistry file"> ${DIN_LOC_ROOT}/atm/scream/mam4xx/linoz/ne4pg2/linoz2010_2010JPL_CMIP6_10deg_58km_ne4pg2_c20250826.nc</mam4_linoz_file_name> | ||
<!--Invariants--> | ||
<mam4_oxid_ymd type="integer" > 20150101 </mam4_oxid_ymd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The oxid year (2015) and linoz year (2010) are now inconsistent. Consider documenting why different years are used for these related chemistry datasets or using the same year for consistency.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have year 2010 for oxid, so use year 2015 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks!
The test is failing due to the missing newly added data files. The file download issue will be resolved when we merge PR #7636. |
@singhbalwinder The failing check is because my account is not approved to run on snl machine. Could you please try it again? Since the automatic download PR is already merged, checks should all pass this time. Thanks! |
We need to rebase this branch for the failing REP test to pass. I will rebase it. |
6f1e797
to
54972e4
Compare
@meng630: I thought this PR should be BFB. Do you think changing these data files (that have essentially the same data) can still make it NBFB? |
@singhbalwinder Yeah, it should be BFB as we are using the same input data as before. However, the linoz file might be the reason for the DIFF in the check. The original linoz file has two time indices: So, if the model interpolates linoz data using |
To tag @odiazib as he should be aware of the linoz time issue. |
<mam4_oxid_file_name type="file" doc="File containing oxidants data">${DIN_LOC_ROOT}/atm/scream/mam4xx/invariants/ne30pg2/oxid_1.9x2.5_L26_1850-2015_ne30pg2_c20241009.nc</mam4_oxid_file_name> | ||
<mam4_oxid_file_name hgrid="ne4np4.pg2" type="file" doc="File containing oxidants data">${DIN_LOC_ROOT}/atm/scream/mam4xx/invariants/ne4pg2/oxid_1.9x2.5_L26_1850-2015_ne4pg2_c20241009.nc</mam4_oxid_file_name> | ||
<mam4_oxid_file_name type="file" doc="File containing oxidants data">${DIN_LOC_ROOT}/atm/scream/mam4xx/invariants/ne30pg2/oxid_1.9x2.5_L26_2015_ne30pg2_c20250813.nc</mam4_oxid_file_name> | ||
<mam4_oxid_file_name hgrid="ne4np4.pg2" type="file" doc="File containing oxidants data">${DIN_LOC_ROOT}/atm/scream/mam4xx/invariants/ne4pg2/oxid_1.9x2.5_L26_2015_ne4pg2_c20250813.nc</mam4_oxid_file_name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a different date, 20250813, for these files. In the Linoz files, we have 20250826. I am just checking if we have fixed the PIO_IOTYPE_PNETCDF
warning in these files.
<mam4_linoz_file_name type="file" doc="LINOZ chemistry file"> ${DIN_LOC_ROOT}/atm/scream/mam4xx/linoz/ne30pg2/linoz1850-2015_2010JPL_CMIP6_10deg_58km_ne30pg2_c20240724.nc</mam4_linoz_file_name> | ||
<mam4_linoz_file_name hgrid="ne4np4.pg2" type="file" doc="LINOZ chemistry file"> ${DIN_LOC_ROOT}/atm/scream/mam4xx/linoz/ne4pg2/linoz1850-2015_2010JPL_CMIP6_10deg_58km_ne4pg2_c20240724.nc</mam4_linoz_file_name> | ||
<mam4_linoz_file_name type="file" doc="LINOZ chemistry file"> ${DIN_LOC_ROOT}/atm/scream/mam4xx/linoz/ne30pg2/linoz2010_2010JPL_CMIP6_10deg_58km_ne30pg2_c20250826.nc</mam4_linoz_file_name> | ||
<mam4_linoz_file_name hgrid="ne4np4.pg2" type="file" doc="LINOZ chemistry file"> ${DIN_LOC_ROOT}/atm/scream/mam4xx/linoz/ne4pg2/linoz2010_2010JPL_CMIP6_10deg_58km_ne4pg2_c20250826.nc</mam4_linoz_file_name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this new version does not produce the PIO_IOTYPE_PNETCDF
warning. However, I am just checking to see if this is correct.
@meng630 @singhbalwinder Note that in the master branch, Linoz is not using the DataInterpolation class. Thus, the time variable is not used for interpolation. In addition, the standalone tests are BFB. If fixing the time variable in the Linoz file is producing DIFFs in the CIME case, I would expect that standalone tests would also produce DIFFs. I ran an NE30 case4 on Frontier using the master branch and new files for Linoz and oxidant. This case was crashing after a few months. @meng630, can you try running a case for more than 1 year? |
I can run ne256 for several months, but the model eventually crashed due to a very high |
Ah..It still got DIFF 🧐 |
That is a great point. The standalone tests are using |
@singhbalwinder You can Files were generated using the same script across resolutions. It's odd ne2 is BFB while ne4 not. Regarding the |
Thanks for the quick check! The ne2 files are not in the namelist XML file. They are in the YAML file for each test. That reminds me that we should change those as well. Perhaps that is why those tests are passing. I will change just the OXID files to see if the standalone tests also fail. |
Another thing I found is, this check failed often in recent PRs. For example, #7632 has nothing to do with aerosols, but with the same DIFF fail as here in the check runs https://github.com/E3SM-Project/E3SM/actions/runs/17189725178/job/48763699453 @singhbalwinder |
Yeah, this check was failing because of a CMAKE issue that is fixed now. I have a "dummy PR", which I rebased and ran tests again. All the tests are passing for that "dummy PR", so all the tests should work for this PR as well. I have now updated the oxid file for the standalone test. Let's see if it still passes. |
@meng630, could you please check if I used the right file for the OXIDs? The tests are failing for the standalone tests as well now. |
@singhbalwinder The oxid file is right. Can we get more information from test failures? I just saw a CMake Error, but I didn't fully understand what it meant. |
Here is the info:
It is a baseline fail comparison. |
There is a bug in the current master branch related to the nBFB issue during restart. The bug occurs during initialization at the following location: The routine To address this issue, the code should be updated as follows:
A similar fix is required for the oxidant reader. I recommend not addressing this bug in the current PR because we are using a one-year file, and this block of code will eventually be removed by the DataInterpolation PR. This bug also explains the nBFB issue during restart, as the initialization routine is invoked at every restart. I fixed this bug using @meng630's PR and compared the outputs of the microphysics standalone test. I observed that the outputs of this test, whether using multiple-year files or one-year files, are BFB. Therefore, I also recommend updating this PR to be nBFB. |
Thanks, @odiazib. If I am understanding correctly, the bug fix above will fix the restart NBFB issue for both single and multi-year files. Since we are using single-year files in this PR, this fix should not matter for this PR. We can tag this PR NBFB, but we need to first understand why this PR is NBFB. I think this PR should be BFB, as we are using the same data as before, and the model is running just for a few time steps/days. There must be a reason why we see these BFB fails, and we need to find that out before merging this PR. |
Yes, adding the offset index will resolve this NBFB issue for multi-year files.
This PR is NBFB because the wrong time index was being used during the initialization of the reader files for Linoz and oxidant. For example, in the standalone test, where the case starts in the 8th month, the time index is 7. Currently, this index in the multi-year file (oxidant file) corresponds to the 8th month of the year 1850. On the other hand, this 7th index in the single-year file corresponds to the 8th month of the year 2015. For interpolation, two-time sections are required. In the multi-year file, the first section is from the 8th month of the year 1850 (which is incorrect), and the second section is from the 9th month of the year 2015. In the single-year file, the first section is from the 8th month of the year 2015, and the second section is from the 9th month of the year 2015. To ensure correctness, we can also create a PR that addresses the bug in the initialization of the time section. This PR will be NBFB. Once this PR is merged, Meng’s PR should be BFB. |
For a multi-year NetCDF file, we also need to add the time offset index. This issue was producing NBFB results, as noted in PR #7638. Without this fix, the model was incorrectly reading the data from the very first year in a multi-year file. For example, for oxidants and LINOZ files, the model was incorrectly reading the data from the year 1850 as its initial time sample (instead of 2015). The second time sample (as we need two time samples to interpolate linearly) was correctly picked up from 2015. With this fix, both time samples are now correctly read from the intended year (2015 in this example), ensuring proper interpolation and model behavior. [NBFB for EAMxx]
For a multi-year NetCDF file, we also need to add the time offset index. This issue was producing NBFB results, as noted in PR #7638. Without this fix, the model was incorrectly reading the data from the very first year in a multi-year file. For example, for oxidants and LINOZ files, the model was incorrectly reading the data from the year 1850 as its initial time sample (instead of 2015). The second time sample (as we need two time samples to interpolate linearly) was correctly picked up from 2015. With this fix, both time samples are now correctly read from the intended year (2015 in this example), ensuring proper interpolation and model behavior. [NBFB for EAMxx]
Replace the multi-year linoz and oxid files with single-year to fix NBFB restart in MAM4xx
fca1dbc
to
7dc4668
Compare
) EAMxx: Update linoz and oxid files in XML Replace the multi-year linoz and oxid files with single-year to fix NBFB restart in MAM4xx [BFB] * meng630/mam4/fix_nbfb: Added the new single year LINOZ files back Update oxid file for the microphysics standalone test Revert LINOZ files to check BFBness Update linoz files to cdf format Update linoz and oxid files in XML
Replace the multi-year linoz and oxid files with single-year to fix NBFB restart in MAM4xx
[BFB]