Skip to content

Conversation

marshallward
Copy link
Collaborator

@marshallward marshallward commented Jul 21, 2025

Latest PR to main. Much of the content is devoted to the open boundary conditions, but there is the usual blend of fixes and features.

Contributors:

breichl and others added 30 commits April 24, 2025 16:13
* Fix bug in Kd_itides allocation check for VBF itides diagnostic

* Set VBF tidal mixing outputs to work with Polzin option in addition to Simmons option

---------

Co-authored-by: brandon.reichl <brandon.reichl@noaa.gov>
  Added additional logic to avoid segmentation faults from applying Flather open
boundary conditions near the opposite side of the computational domain (e.g.,
applying a Northern Flather boundary condition on the southern edge of the
domain).  This situation does not arise when open boundaries are only found on
the edges of a domain, but in parallel cases with sloped OBCs (such as the
ESMG-configs rotated seamount test case) this atypical situation can arise.  The
key point to the specific fix is to note that when such a case arises, the OBC
velocity in the outer halo point does not subsequently impact the solution on
that PE either because its effects are masked out by land or OBC points at the
neighboring orthogonal velocities, and then they are overwritten by a halo
update (perhaps several steps later).  All answers are bitwise identical in all
cases that have been tested and were working before, and the
ESMG/rotated_seamount test case is now running correctly (and giving identical
answers) for a variety of PE counts.
  Refactored the shelfwave OBC code to correct a number of minor problems.  It
corrects some dimensionally inconsistent expressions. It adds a rescaling factor
to ensure that the solutions are mathematically equivalent for various units
used to describe the Cartesian horizontal grid, and adds a fatal error if the
grid is not Cartesian.  It eliminates several redundant hard-coded dimensional
values used to initialize the control structure.  It eliminates elements from
the control structure for this type that are not reused after initialization.
Finally, it adds the new runtime parameter SHELFWAVE_CORRECT_AMPLITUDE that
causes the inflow velocity amplitude to actually match the value specified in
SHELFWAVE_AMPLITUDE.  When SHELFWAVE_CORRECT_AMPLITUDE is true, it alters the
default value of SHELFWAVE_AMPLITUDE to give roughly similar amplitudes of the
flow to those with the default value when it is false.  By default solutions are
unchanged, but there is a new runtime parameter.
  Keep 5 recently added area integrated surface mass or heat flux diagnostics in
forcing_diagnostics in rescaled units by replacing an unscale argument with a
tmp_scale argument to global_area_integral.  Also added the corresponding
conversion arguments to the register_scalar_field calls for each of these
diagnostics, so that the documented units and conversion factors can be used to
confirm the correctness of the documented units for these variables.  The
appropriate internal temporary variables for the integrated heat or mass fluxes
were also used for clarity about the units of these variables.  Benign
`conversion=1.0` arguments were added to another 8 register_diag_feild calls for
nondimensional forcing variables.  All answers and diagnostics are bitwise
identical.
  Corrected the descriptions of the units of two I_rho variables in
find_ustar_fluxes() and find_ustar_mech_forcing() and of the lensponge internal
variable in RGC_initialize_sponges().  Also corrected the comments describing
the locations of grid%geolatT and grid%geolonT.  Also refactored
dumbbell_initialize_thickness() to use geoLonT or geoLatT directly rather than
setting a temporary variable with ill-defined units that is only used once. Only
changes in comments or very simple code logic changes are included in this
commit and all answers are bitwise identical.
  Replaced 14 scale arguments in recently added checksum calls with the
preferred unscale argument, following the pattern used elsewhere throughout the
MOM6 code.  These two arguments are equivalent, but unscale makes it easier to
identify implausible combinations of scaling factors.  All answers are bitwise
identical.
* Try again at fix_obc_maskingdepth patch

 - "git rebase" made a conflicted mess

* Fix for even number of OBC turns.

 - not that turns other than 0, 1 is supported elsewhere for OBCs.

 This still has issue #5 from a comment in #752:

 Some experimentation with the rotate_OBC subroutines. Dr Neumann's test
 uses boundary data of the value=const type. Copying the buffer_dst from
 OBC_in to CS%OBC gets some of these across, also the tracer reservoir
 values. However, the tracer reservoir values get overwritten by an
 interior tracer value between the first call to step_MOM_dynamics and
 the second.

* Trying to fix internal OBC rotations, not sure

* Fixing next round of rotated OBC issues

* Fix up some logic for turns = 2 or 3.

 - Note that this is still not supported, as specificed in MOM.F90.

* Adding back in the deallocate on some OBC arrays
…p/salt

- Change fatal error to warning with Roquet EOS and linear TFreeze.

- Add flexibility for TFreeze and different definition of temp/salt from
  model

- If the model is absolute salinity, this allows one to convert the
  salinity to practical salinity before calling the freezing point
  subroutine.

- If the model is conservative temperature but the freezing point
  subroutine returns the potential temperature, this allows one to
  convert the freezing point to conservative temperature.

- The new flags are TFREEZE_S_IS_PRACS = True if TFreeze expects
  practical salinity (default is false for TEOS10 or TEOS_POLY
  TFREEZE_FORM, otherwise it is true) and TFREEZE_T_IS_POTT = True if
  TFreeze returns a potential temperature (default is flase for TEOS10
  or TEOS_POLY TFREEZE_FORM, otherwise it is true).

- The EOS type now stores the use_conT_absS flag to make these checks
  easier.

---------

Co-authored-by: brandon.reichl <brandon.reichl@noaa.gov>
This commit is part 1/2 of enabling user specified compressibility in
linear equation of state. An attribute drho_dp is added in extended type
linear_EOS and various calculations for density, specific volume and
their first order derivatives. But the full compressibility is not yet
added in density and specific volume integrals.

* Two recently added runtime parameters (TREF and SREF) for linear EOS
are renamed to T_REF_LINEAR_EOS and S_REF_LINEAR_EOS, respectively. The
old names could be confused with T_REF and S_REF, which are used to
initialize the model's temperature and salinity in certain methods.
RHO_TREF_SREF is also renamed to RHO_REF_LINEAR_EOS.
* Description of parameter RHO_T0_S0 is modified for clarification.
* Two new runtime parameters are added (DRHO_DP and P_REF_LINEAR_EOS).
* `Compressible` attribute of linear_EOS type is deterimined by whether
DRHO_DP is zero or not. (At the moment, the attribute is only used to
prevent user from using Montgomery PGF when EOS is compressible.)
This commit is part 2/2 of enabling user specified compressibility in
linear equation of state. Terms with DRHO_DP are added in density and
specific volume integrals. Answers with DRHO_DP=0 are not changed.
  Add an optional logunit argument to the 3 simple chksum() routines, or write
the checksums to error_unit if it is absent, following the pattern in all of the
other MOM6 checksum routines.  All solutions are bitwise identical, but there is
a new optional argument to a publicly visible interface.
  Added 4 new diagnostics (SSH_u_OBC, SSH_v_OBC, ubt_OBC and vbt_OBC) to
diagnose the outer barotropic forcing forcing fields at Flather open boundary
condition points.  Also added 4 paired checksum calls for 8 OBC-related 2d
arrays when OBCs are in use and DEBUG is true.   All answers are bitwise
identical but there are new diagnostics in the available_diags files.
Rather than explicitly list known unit tests, we now either wild
card or use "find" to create a list.
- Using wild cards to catch all the drivers/timing/* executables
- Run these tests on a push, but not to compare to target
- Enabled artifacts for target opt executables
- Had to re-clone target codebase in the compare-timings job
- Also had to use the target_codebase/.testing/Makefile directly
  to avoid rebuilding executables
  Fixed a bug that was causing some open boundary condition fields (e.g., those
related to the tides) not to be read in from files before they are used unless
there are some PEs that do NOT include any input date based OBC points.  This
bug causes certain configurations with not to reproduce across PE count.  Most
high-PE count jobs (those with at least 3 PEs in each direction) previously were
correct, but single PE jobs or jobs with a prime number or 2 times a prime
number of PEs were never correct for these cases.  Specifically, this commit
removes the some_need_no_IO_for_data element from the ocean_OBC_type, and it
always calls update_OBC_segment_data() during initialization when any OBCs are
being used.  It also replaces three calls to allocate_rotated_array() with calls
to the new internal subroutine allocate_rotated_seg_data() to avoid a bug where
the size of rotated data segments is right but the index range is wrong.  This
commit will change low-PE count answers for some cases with certain types of
open boundary conditions.
  Added the new function rotate_OBC_segment_direction that returns an integer
indicating the direction of an OBC segment on rotated grid, given its direction
on the current grid and the number of turns by which that grid has been
rotated.  A negative number of turns undoes the rotation.  For now this new
routine is used in rotate_OBC_segment_config, but additional uses will follow
with subsequent commits.  All answers are bitwise identical, but there is a new
publicly visible function.
  Added the new publicly visible subroutines write_OBC_info() and
chksum_OBC_segments() for writing verbose information for debugging open
boundary conditions.  Write_OBC_info() writes extensive information about the
settings and the contents of arrays in the ocean_OBC_type, modifying the output
so that it should be largely invariant to rotation.  Chksum_OBC_segments() does
a checksum on all allocated elements of the open boundary condition segments and
optionally writes out a number of layers of segment data, with the order of
output chosen to appear as it would if the data were for an eastern open
boundary.  Because write_OBC_info() is rather verbose, it should probably only
be used for intensive debugging, while the nk argument to chksum_OBC_segments()
helps to manage its verbosity and makes it a candidate for inclusion in more
wide-spread debugging.  This commit also adds scalar_pair arguments to 5
uvchksum calls for radiation and oblique open boundary conditions and tracer
reservoirs.  Only diagnostic debugging code is added here, and all solutions are
bitwise identical, but there are two new publicly visible interfaces for writing
information for debugging open boundary conditions.
  Added the new runtime parameter DEBUG_OBCS that triggers verbose diagnostic
output about the contents of the open boundary condition type via calls to the
new routines write_OBC_info() and chksum_OBC_segments(), along with the new
runtime parameter NK_OBC_DEBUG to regulate the volume of diagnostic output.  It
also renames the previous runtime parameter DEBUG_OBC to OBC_DEBUGGING_TESTS to
trigger the code that resets the normal velocities at OBC segments to zero and
the thicknesses and tangential velocities behind OBC segments to silly values
for testing, with DEBUG_OBC retained for now as the old name.  This latter
change follows the pattern elsewhere in the code in which other DEBUG parameters
do not actually change any values.  This commit also eliminates a fatal error
message if open boundary conditions are applied with rotations of 180 or 270
degrees.  All solutions in cases that worked previously are bitwise identical,
but there are new runtime parameters.
This simplifies and expands the timing/testing of reproducing sums
that were in .../unit_drivers and would not work without some
setup of the environment (input files). The refactored drivers
are stand alone and (mostly) fit within the pattern of the other
unit tests and timing tests.

- Moves config_src/drivers/unit_drivers/MOM_sum_driver.F90 to
  config_src/drivers/timing/time_reproducing_sum.F90
- Cleaned up time_reproducing_sum to not use unnecessary
  infrastructure (like grids, parameter files, etc.)
- Added config_src/drivers/unit_tests/test_reproducing_sum.F90
  to perform property tests removed from timing test.
- Added trivial unit test that reproducing_sum() gives correct
  analytic sum for array of precsribed values
- Added new tests that reproducing_sum() results are invariant
  to order of values in arrays
- Made MOM_define_layout() public
- Initialized MOM_dom%nonblocking_updates and MOM_dom%thin_halo_updates
  when optional arguments are missing (bug fix)
- Hacked .testing/tools/disp_timing.py to not fail on the FMS
  tail sheet which inevitably is generated when using FMS :(
  Collected the conversion factors for the 4 salt tendency diagnostics
diabatic_salt_tendency, diabatic_salt_tendency_2d, boundary_forcing_salt_tendency
and boundary_forcing_heat_tendency_2d out of the calculation of the diagnostics
before they are posted and into the conversion argument in their
register_diag_field calls.  This change simplifies the code where the
diagnostics are calculated, but although all expressions are mathematically
equivalent, there is a change in the order of arithmetic with which these 4
diagnostics are calculated, so answers will change at roundoff, although this is
unlikely to be detected if the diagnostics are being written as 32 bit floats.
All model solutions are bitwise identical.
    Fixes inconsistency between layers and interfaces in MOM_internal_tides
    Clip the diffusivity in diagnostics used for the buoyancy fluxes
These changes enhance the existing vertical diffusivity used in EPBL with machine learned diffusivity that is constrained on a second moment closure. Using symbolic regression and least-squares fitting, a shape function ( g(\sigma) ) responds to changes in the surface forcing (Latitude, wind stress, surface buoyancy flux, boundary layer depth). g(\sigma) goes from zero to 1 and its skewness changes as per surface forcing conditions.

There are two options for the turbulent velocity scale: v0 and v0h.

The velocity scale, v0, is an approximation that depends on three inputs: Coriolis parameter f, ustar u_*, and surface buoyancy flux BFlux. It agrees with diagnosed velocity scale from GLS scheme given by v0 = < { \kappa } max / h >, where h is boundary layer depth and angled brackets denote averaged data for a given forcing: B, u_*, f. 
When v0 is multiplied with g(\sigma) and multiplied by the energetics based boundary layer depth h, i.e \nu = . g(\sigma) \cdot v_0 \cdot h, we get a diffusivity which is constrained on a second moment closure.

The second velocity scale, v0h, depends on  ustar u_*, surface buoyancy flux BFlux, and boundary layer depth. It agrees with diagnosed velocity scale from GLS scheme given by v0^h =  { \kappa } max / h , where h is the boundary layer depth. v0^h agrees with convective velocity scaling under pure convection ( v0^h ~ (Bh)^(1/3) ).

When v0 or v0^h is multiplied with g(\sigma) and multiplied by the energetics based boundary layer depth h, i.e \nu = . g(\sigma) \cdot v_0 \cdot h, we get a diffusivity which is constrained on a second moment closure.

The Machine learned (ML) diffusivity is activated by using the logical flags:
1. EPBL_EQD_DIFFUSIVITY_SHAPE
2. EPBL_EQD_DIFFUSIVITY_VELOCITY
3. EPBL_EQD_DIFFUSIVITY_VELOCITY_H

Logical flag 1 activates the new equation for shape function.
Logical flag 2 activates velocity scale v0
Logical flag 3 activates velocity scale v0^h

To use ML diffusivity, Logical flag 1 has to be set to ‘True’ and either logical flag 2 or 3 should be set to ‘True’. 2 and 3 both cannot be true or false.
The subroutines replace the default EPBL mixing coefficients of Reichl and Hallberg (2018).
The publications for OSBL machine learned diffusivity are: Sane et al. 2023 ( https://doi.org/10.1029/2023MS003890 ) and Sanę et al. 2025.

The original version of this commit was modified by adding do_not_log arguments to some of the newly added get_param calls.
  Revised the MOM_open_boundary module to better handle grid rotation for an
arbitrary number of turns.  Rotate_OBC_config now properly handles the setting and
in some cases swapping of global information between the rotated and unrotated
ocean_OBC_types, while rotate_OBC_segment_config and rotate_OBC_segment_data were
modified to do the same for OBC_segment_type settings and data fields.  These
changes also include corrections to the code that rotates the input buffer and
corrects the sign of vector components in  update_OBC_segment_data when there is
grid rotation.  It also adds a missing scalar_pair argument to the restart
registration call for the normal radiation arrays.

  Several comments were also corrected or expanded for greater clarity, and some
instances of non-standard use of white space were corrected.

  All answers in non-rotated cases are bitwise identical, and some rotated cases
that previously gave segmentation faults are new running, although rotation does
not yet work properly for all cases with open boundary conditions.
  Added the new runtime parameter OBC_HOR_INDEXING_BUG that can be set to true to
retain previous answers or false to correct 5 horizontal indexing bugs that
prevent solutions with open boundary conditions from satisfying rotational
symmetry.  Four of these bugs are related to the use of the brushcutter mode to
set various terms in the open boundary conditions, while the fifth corrects the
total ocean depth used to  adjust the thicknessses used to remap tangential flow
fields that are discretized at vorticity points.  By default, all answers are
bitwise identical, but there is a new runtime parameter in some MOM_parameter_doc
files.
The jki loops in vertvisc() have been reordered to kji.  The solver
increases the number of concurrent tridiagonal solvers from Ni to Ni*Nj.

Two other changes contributed to performance

* Moving diagnostics (e.g. ADp%du_dt_str) outside of loops
* Conditional computing of Ray() when visc%Ray_[uv] is set

Not all optimizations of this sort were applied, and should be reviwed
in relevant experiments.

This showed a modest performance improvement on CPUs.  Three instances
are shown below.

* mom_vert_friction_mp_vertvisc_:   0.583s,   0.629s (-7.3%)
* mom_vert_friction_mp_vertvisc_:   0.576s,   0.634s (-9.2%)
* mom_vert_friction_mp_vertvisc_:   0.583s,   0.636s (-8.3%)

This patch uses nested do loops since we have not yet adoped do
concurrent loop constructs.  But a future do concurrent form shows even
greater speedup, e.g.

* mom_vert_friction_mp_vertvisc_:   0.258s,   0.539s (-52.2%)

The work in this PR will prepare this module for porting to GPUs.

Co-authored-by: Edward Yang <edward_yang_125@hotmail.com>
As with vertvisc(), this patch rewrites the vertvisc_remnant()
tridiagonal solvers to run in kji order, with even greater benefits to
runtime.  Three instances are shown below.  Speedup is about 1.3-1.4x.

* mom_vert_friction_mp_vertvisc_remnant_:   0.939s,   1.241s (-24.3%)
* mom_vert_friction_mp_vertvisc_remnant_:   0.935s,   1.265s (-26.0%)
* mom_vert_friction_mp_vertvisc_remnant_:   0.910s,   1.258s (-27.7%)

As before, only the diagnoal array (b1) was promoted to 3d.

As with vertvisc() this change is expected to be highly favorable to
GPU performance.
- Minor re-factor of numerical_testing_type_unit_tests()
- Added a doxygen blurb for the module including usage for
  the helped class
- Updated test_numerical_testing_type with rename of unit_test
  function
  This commit makes a set of changes to refactor code related to open boundary
conditions to correct issues with restarts.

  Renamed open_boundary_init to open_boundary_halo_update to reflect what it
actually does and eliminated the 4 unused arguments to this routine.

  Moved halo updates for temperature and salinity and the call to
setup_OBC_tracer_reservoirs out of fill_temp_salt_segments and made the tv
argument to fill_temp_salt_segments intent in.

  Moved the call to setup_OBC_tracer_reservoirs and the debugging calls for OBCs
into the same code block as the rest of the OBC code in MOM_initialize_state.

  Moved the call to open_boundary_halo_update after MOM_state_initialization for
both rotated and unrotated branches.

  Moved the calls to fill_temp_salt_segments and setup_OBC_tracer_reservoirs and
the debugging calls for OBCs into the same code block as the rest of the OBC
code in MOM_initialize_state.

  All answers are bitwise identical, but there are changes to public interfaces
and what does and does not occur in several routines.
Copy link
Collaborator

@dougiesquire dougiesquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marshallward I think the links in your first post are not correct. They link to the issue/PR in this repo with the same number as the PR to NOAA-GFDL/MOM6.

Anyway, this PR looks good and does not change answers in ACCESS-OM3 configs.

@marshallward
Copy link
Collaborator Author

Thanks @dougiesquire the links have been corrected.

@Hallberg-NOAA Hallberg-NOAA mentioned this pull request Jul 28, 2025
@Hallberg-NOAA
Copy link
Collaborator

It should be noted that the reason why we are getting regression-testing failures is because we added a model time into the diagnostic checksums. In the future, this will allow us to move around the sequence of calls posting diagnostics without the automated testing thinking that there have been answer changes, but for now the change in the diagnostic checksum formatting is (erroneously) showing up as regression changes.

@mathomp4
Copy link
Collaborator

As far as my testing for GMAO, this seems to be zero-diff for us at very-low-res (the resolution I can run). @sinakhani will be doing a more thorough test at higher res but I'll approve for now.

DEALLOC_(CS%Idx2dyCu) ; DEALLOC_(CS%Idx2dyCv)
DEALLOC_(CS%Idxdy2u) ; DEALLOC_(CS%Idxdy2v)
DEALLOC_(CS%Ah_bg_xx) ; DEALLOC_(CS%Ah_bg_xy)
if (CS%bound_Ah) then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file (and elsewhere), some but not all instances of ALLOCABLE_, ALLOC_, and DEALLOC_ have been replaced with their intrinsic versions: allocatable, allocated, and deallocate.

Could you clarify the criteria used to determine which instances were transformed? I’m asking because we’re encountering conflicts with some of these changes, and I’d like to apply the same selection criteria consistently to transform ALLOC_ etc. to allocatable etc. when resolving them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static allocation macros should be used for any array that is persistent for the entire run, although one can probably find exceptions, partly due to contributors who are unclear about the rules (including myself).

In practice, I don't think anyone is running with static memory anymore. It is mainly used as a verification test, and has found the occasional bug.

I found an example of arrays in stochastic_init that were changed from ALLOC_() to allocate(), possibly because they are conditionally used. Maybe that is the reason?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one point, perhaps 15 years ago, the predecessor to MOM6 was about 20% more efficient when compiled and run in static memory mode, but this makes the model much less flexible for changing layouts or masking out all-land processors. More recently, we have been finding that static memory is not dramatically more efficient than dynamic memory, but it is still useful to run test cases in both static and dynamic memory mode because differences between them can be used to indicate (and track down) certain types of memory use problems. We have debated eliminating the static memory macros altogether, but on balance we still think that they do provide additional useful self-consistency checks.

The issue with the use of the static memory macros is that any arrays declared with them will always be assigned memory in static memory mode, even if they are never used. This increases the memory footprint of the MOM6, and it can reduce the size of static configurations that can be used on a single processor for rotational consistency testing, for example. We think that the preferred practice should be to restrict the use of static memory only to arrays that are almost always used in most MOM6 configurations, but avoid using static memory for arrays that are only used with a limited set of options. The changes in memory allocation with this PR follows this preferred practice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks!

@awallcraft
Copy link
Collaborator

awallcraft commented Jul 31, 2025

NOAA-GFDL#949 (Fix frazil halo update bug when unallocated) is a 1-line bugfix that should be added here before merging.

If FRAZIL=False tv%frazil is not allocated but its halo might still
be updated in post_diabatic_halo_updates, which will fail.

If FRAZIL=True, no answers are changed.

If FRAZIL=False, no answers from before this bug was introduced are changed.
Copy link
Collaborator

@abozec abozec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the bug fix from Alan, I now have my simulation with sponge and OBC working. COAPS approves the pull request.

@marshallward
Copy link
Collaborator Author

NOAA-GFDL#949 (Fix frazil halo update bug when unallocated) is a 1-line bugfix that should be added here before merging.

@awallcraft We moved your PR directly into this candidate PR branch. This will feed back into dev/gfdl after this is accepted and merged.

@marshallward
Copy link
Collaborator Author

Thanks all, this has now been approved.

@jiandewang
Copy link
Collaborator

since Alan's bug fixing is merged here, let me re-run UFS, shall be able to finish in half day

@jiandewang
Copy link
Collaborator

no issue found in my fresh run (with Alan's commit being added)

@marshallward
Copy link
Collaborator Author

Thanks @jiandewang. I will merge this now.

(Some lingering CI tests will request a manual override)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.