Skip to content

Conversation

mjschmidt271
Copy link
Collaborator

@mjschmidt271 mjschmidt271 commented Aug 25, 2025

Brings in the new python script for validation test comparison. Employs relative error for pass/fail criteria and adds optional args for extra debug info.

The important details are:

  • There are quite a few tests that fail based on relative error, and I've included a list below.
    • The convention is Tol <old-abs-tol> -> <new-rel-tol-for-passing> - <test number> - <test name>
    • Some tests are showing very high levels of relative error, indicating there may be major issues with the associated code.
      • I have denoted those with ***asterisks***
  • This presents a tricky issue given the following, somewhat competing goals:
    1. Merging PRs should not have failing tests.
    2. We would like to maintain the previous results, based on absolute error, as a form of regression checking.
    3. The tests that fail based on relative error should give clear output indicating as such--something like a warning rather than an error.
    • My reasoning on this final point is that relative error contains strictly more information than absolute error, and we currently have tests passing based on absolute error tolerances that are not meaningful.
    • Given this, when the time is appropriate, we should adjust tolerances to align with the relative error values.
    • Additionally, there are some tests that exhibit relative error values of 1, indicating that the calculated quantity is in total error (the magnitude of the error is the same as the magnitude of the quantity).
      • I have not had the chance to look into these.
      • It is possible that this is not a huge issue for example $q_{\text{ref}} = 0,\ q_{\text{calculated}} = 10^{-14}, \mathcal{E}_{\text{rel}} = 1$, though I believe I have addressed this particular corner case.
  • My approach to addressing the 3 factors above is to use CTest's SKIP_REGULAR_EXPRESSION functionality.
    • CTest is not actually that flexible, and this seemed to be the best option.
    • This feature looks for a particular return string, and when it is found, denotes the test as skipped, and the list of skipped tests appears at the bottom of the make test output, and all of the generated output is still printed to log.
    • While this is a bit of a misnomer, I think it's the best option because seeing the output is unavoidable and will keep it front-of-mind for devs to look into while still maintaining the previous pass/fail behavior with the previous absolute tolerance levels.
Tests failing based on relative error, and tolerance adjustments

Tol 2e-11 -> 4e-11 - 48 - validate_mer07_veh02_nuc_mosaic_1box
Tol 1e-12 -> 2e-10 - 60 - validate_calcsize_compute_dry_volume
Tol 5e-11 -> 5e-8 - 62 - validate_stand_modal_aero_calcsize_sub
Tol 2e-6 -> 2e-1 - 74 - validate_ma_precpprod
Tol 2e-6 -> 1.5e0 - 90 - validate_compute_massflux_small
Tol 4e-4 -> 3e-2 - 134 - validate_pcarbon_aging_1subarea
Tol 5e-8 -> 2.5e-6 - 202 - validate_calc_1_impact_rate_ts_0
Tol 1e-11 -> 1e-9 - 204 - validate_modal_aero_bcscavcoef_get_ts_355
Tol 1e-9 -> 5e-7 - 210 - validate_stand_aero_model_calcsize_water_uptake_dr_ts_379
Tol 1e-13 -> 1e-9 - 212 - validate_baseline_aero_model_wetdep_ts_379
Tol 1e-6 -> 1e-5 - 214 - validate_clddiag
Tol 1e-6 -> 2e-1 - 222 - validate_wetdep_prevap_130
Tol 1e-6 -> 3e-6 - 224 - validate_wetdep_prevap_230
Tol 1e-6 -> 1.5e0 - 234 - validate_wetdep_scavenging_true
Tol 1e-6 -> 1.5e0 - 236 - validate_wetdep_scavenging_false
Tol 1e-6 -> 6e-6 - 240 - validate_rain_mix_ratio
Tol 1e-6 -> 4e-1 - 246 - validate_wetdep_resusp_130
Tol 1e-6 -> 4e-1 - 248 - validate_wetdep_resusp_230
Tol 1e-12 -> 3e-10 - 364 - validate_linmat_ts_355
Tol 1e-12 -> 4e-10 - 366 - validate_nlnmat_ts_355
Tol 1e-12 -> 3e-10 - 368 - validate_imp_prod_loss_ts_355
Tol 7e-11 -> 1e0 - 370 - validate_newton_raphson_iter_ts_355 (only max_delta)
Tol 1e-11 -> 3e-10 - 408 - validate_maxsattype1_merged
Tol 1e-11 -> 3e-10 - 410 - validate_maxsattype2_merged
Tol 6e-11 -> 1e-10 - 498 - validate_lin_strat_chem_solve_ts_1415
Tol 1e-13 -> 3e-10 - 500 - validate_lin_strat_sfcsink_ts_1415_multicol
Tol 6e-13 -> 5e-10 - 502 - validate_lin_strat_sfcsinkmulticol_merged
Tol 1.3e-5 -> 1.5e0 - 512 - validate_chm_diags_ts_355
Tol 9e-8 -> 5e-5 - 538 - validate_calc_het_rates_merged
Tol 1e-13 -> 1.5e-10 - 540 - validate_calc_precip_rescale_merged
Tol 9e-7 -> 1.5e-5 - 546 - validate_sethet_merged
Tol 1e-11 -> 1.5e-9 - 554 - validate_calc_sox_aqueous_ts_355_merged
Tol 1e-13 -> 1.5e-10 - 566 - validate_calc_diag_spec_ts_355
Tol 7e-11 -> 1.5e-10 - 580 - validate_modal_aero_lw_ts_355
Tol 1e-12 -> 3.5e-10 - 584 - validate_update_aod_spec_ts_355
Tol 7e-11 -> 1.5e-10 - 586 - validate_aer_rad_props_lw_ts_355
Tol 8e-11 -> 3e-10 - 588 - validate_aer_rad_props_sw_ts_355
Tol 8e-11 -> 3e-10 - 590 - validate_volcanic_cmip_sw_ts_355
Tol 2e-10 -> 5e-2 - 616 - validate_mam_soaexch_1subarea_ts_379
Tol 1e-13 -> 2e-11 - 618 - validate_gas_aer_uptkrates_1box1gas_ts_379
Tol 4e-10 -> 5e-4 - 620 - validate_mam_gasaerexch_1subarea_ts_379
Tol 1e-12 -> 5e-11 - 622 - validate_vert_interp_ts_300
Tol 1e-12 -> 1e-10 - 624 - validate_vert_interp_col_ts_300

Copy link

codecov bot commented Sep 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.43%. Comparing base (6ceda4b) to head (138028b).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #477   +/-   ##
=======================================
  Coverage   93.43%   93.43%           
=======================================
  Files         303      303           
  Lines       25177    25177           
  Branches     2766     2766           
=======================================
  Hits        23523    23523           
  Misses       1654     1654           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mjschmidt271
Copy link
Collaborator Author

Oooook!

I believe I've got this to where it needs to be. The important details are:

  • There are quite a few tests that fail based on relative error, and I've included a list below.
    • The convention is Tol <old-abs-tol> -> <new-rel-tol-for-passing> - <test number> - <test name>
    • Some tests are showing very high levels of relative error, indicating there may be major issues with the associated code.
      • I have denoted those with ***asterisks***
  • This presents a tricky issue given the following, somewhat competing goals:
    1. Merging PRs should not have failing tests.
    2. We would like to maintain the previous results, based on absolute error, as a form of regression checking.
    3. The tests that fail based on relative error should give clear output indicating as such--something like a warning rather than an error.
    • My reasoning on this final point is that relative error contains strictly more information than absolute error, and we currently have tests passing based on absolute error tolerances that are not meaningful.
    • Given this, when the time is appropriate, we should adjust tolerances to align with the relative error values.
    • Additionally, there are some tests that exhibit relative error values of 1, indicating that the calculated quantity is in total error (the magnitude of the error is the same as the magnitude of the quantity).
      • I have not had the chance to look into these.
      • It is possible that this is not a huge issue for example $q_{\text{ref}} = 0,\ q_{\text{calculated}} = 10^{-14}, \mathcal{E}_{\text{rel}} = 1$, though I believe I have addressed this particular corner case.
  • My approach to addressing the 3 factors above is to use CTest's SKIP_REGULAR_EXPRESSION functionality.
    • CTest is not actually that flexible, and this seemed to be the best option.
    • This feature looks for a particular return string, and when it is found, denotes the test as skipped, and the list of skipped tests appears at the bottom of the make test output, and all of the generated output is still printed to log.
    • While this is a bit of a misnomer, I think it's the best option because seeing the output is unavoidable and will keep it front-of-mind for devs to look into while still maintaining the previous pass/fail behavior with the previous absolute tolerance levels.
Tests failing based on relative error, and tolerance adjustments

Tol 2e-11 -> 4e-11 - 48 - validate_mer07_veh02_nuc_mosaic_1box
Tol 1e-12 -> 2e-10 - 60 - validate_calcsize_compute_dry_volume
Tol 5e-11 -> 5e-8 - 62 - validate_stand_modal_aero_calcsize_sub
Tol 2e-6 -> 2e-1 - 74 - validate_ma_precpprod
Tol 2e-6 -> 1.5e0 - 90 - validate_compute_massflux_small
Tol 4e-4 -> 3e-2 - 134 - validate_pcarbon_aging_1subarea
Tol 5e-8 -> 2.5e-6 - 202 - validate_calc_1_impact_rate_ts_0
Tol 1e-11 -> 1e-9 - 204 - validate_modal_aero_bcscavcoef_get_ts_355
Tol 1e-9 -> 5e-7 - 210 - validate_stand_aero_model_calcsize_water_uptake_dr_ts_379
Tol 1e-13 -> 1e-9 - 212 - validate_baseline_aero_model_wetdep_ts_379
Tol 1e-6 -> 1e-5 - 214 - validate_clddiag
Tol 1e-6 -> 2e-1 - 222 - validate_wetdep_prevap_130
Tol 1e-6 -> 3e-6 - 224 - validate_wetdep_prevap_230
Tol 1e-6 -> 1.5e0 - 234 - validate_wetdep_scavenging_true
Tol 1e-6 -> 1.5e0 - 236 - validate_wetdep_scavenging_false
Tol 1e-6 -> 6e-6 - 240 - validate_rain_mix_ratio
Tol 1e-6 -> 4e-1 - 246 - validate_wetdep_resusp_130
Tol 1e-6 -> 4e-1 - 248 - validate_wetdep_resusp_230
Tol 1e-12 -> 3e-10 - 364 - validate_linmat_ts_355
Tol 1e-12 -> 4e-10 - 366 - validate_nlnmat_ts_355
Tol 1e-12 -> 3e-10 - 368 - validate_imp_prod_loss_ts_355
Tol 7e-11 -> 1e0 - 370 - validate_newton_raphson_iter_ts_355 (only max_delta)
Tol 1e-11 -> 3e-10 - 408 - validate_maxsattype1_merged
Tol 1e-11 -> 3e-10 - 410 - validate_maxsattype2_merged
Tol 6e-11 -> 1e-10 - 498 - validate_lin_strat_chem_solve_ts_1415
Tol 1e-13 -> 3e-10 - 500 - validate_lin_strat_sfcsink_ts_1415_multicol
Tol 6e-13 -> 5e-10 - 502 - validate_lin_strat_sfcsinkmulticol_merged
Tol 1.3e-5 -> 1.5e0 - 512 - validate_chm_diags_ts_355
Tol 9e-8 -> 5e-5 - 538 - validate_calc_het_rates_merged
Tol 1e-13 -> 1.5e-10 - 540 - validate_calc_precip_rescale_merged
Tol 9e-7 -> 1.5e-5 - 546 - validate_sethet_merged
Tol 1e-11 -> 1.5e-9 - 554 - validate_calc_sox_aqueous_ts_355_merged
Tol 1e-13 -> 1.5e-10 - 566 - validate_calc_diag_spec_ts_355
Tol 7e-11 -> 1.5e-10 - 580 - validate_modal_aero_lw_ts_355
Tol 1e-12 -> 3.5e-10 - 584 - validate_update_aod_spec_ts_355
Tol 7e-11 -> 1.5e-10 - 586 - validate_aer_rad_props_lw_ts_355
Tol 8e-11 -> 3e-10 - 588 - validate_aer_rad_props_sw_ts_355
Tol 8e-11 -> 3e-10 - 590 - validate_volcanic_cmip_sw_ts_355
Tol 2e-10 -> 5e-2 - 616 - validate_mam_soaexch_1subarea_ts_379
Tol 1e-13 -> 2e-11 - 618 - validate_gas_aer_uptkrates_1box1gas_ts_379
Tol 4e-10 -> 5e-4 - 620 - validate_mam_gasaerexch_1subarea_ts_379
Tol 1e-12 -> 5e-11 - 622 - validate_vert_interp_ts_300
Tol 1e-12 -> 1e-10 - 624 - validate_vert_interp_col_ts_300

@mjschmidt271 mjschmidt271 requested review from odiazib, jaelynlitz, overfelt and singhbalwinder and removed request for odiazib and jaelynlitz September 26, 2025 23:29
@mjschmidt271 mjschmidt271 merged commit 33c8135 into main Oct 7, 2025
34 of 40 checks passed
@mjschmidt271 mjschmidt271 deleted the mjs/ff_mxv branch October 7, 2025 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants