Skip to content

Conversation

theresa-cordero
Copy link

@theresa-cordero theresa-cordero commented Feb 27, 2025

Modularize btstep

(edited to correct the names of the new routines)

This PR moves some parts of btstep() into subroutines:

  • btstep_ubt_from_layer
  • btstep_find_Cor
  • truncate_velocities
  • btloop_eta_predictor
  • store_u_for_OBCs and store_v_for_OBCs
  • btloop_update_v
  • btloop_update_u
  • btstep_layer_accel

This eliminates the duplication of the u and v update code in the bt loop, but preserves the BT_USE_OLD_CORIOLIS_BRACKET_BUG in the update when v is done second.

The creation of these subroutines should not change answers. I have tested it in the Baltic OM4 0.25 case with BT_USE_OLD_CORIOLIS_BRACKET_BUG both True and False.

Each routine has many arguments and I have tried to correctly label them intent(in) or intent(inout) as appropriate. Some variables did not have clear comments describing them in this module (ex. use_BT_cont or jsvf), so I have not put doxygen comments describing them in this commit. These can be added when I address the review.

The short description of each routine is:

btstep_ubt_from_layer: Calculate the zonal and meridional velocity from the 3-D velocity.
btstep_find_Cor: Find the Coriolis force terms _zon and _mer.
truncate_velocities: Do a CFL-based truncation of any excessively large batotropic velocities.
btloop_eta_predictor: A routine to set eta_pred and the running time integral of uhbt and vhbt.
store_[uv]_for_OBCs: Store the existing velocities and transports for use with open boundary conditions.
btloop_update_v: Update meridional velocity.
btloop_update_u: Update zonal velocity.
btstep_layer_accel: Calculate the zonal and meridional acceleration of each layer due to the barotropic calculation.

@Hallberg-NOAA
Copy link
Member

Thank you for this contribution, @theresa-cordero. There are a number of issues here with the openMP directives and some doxygen comments that need to be addressed, and some of the new routines strike me as not being quite the right granularity. I have put together a branch of MOM6 at https://github.com/Hallberg-NOAA/MOM6/tree/modularize_btstep_rebase in which I am attempting to work through these issues, and which might be the basis for some modifications to this pull request.

@Hallberg-NOAA Hallberg-NOAA added the refactor Code cleanup with no changes in functionality or results label Mar 7, 2025
@theresa-cordero
Copy link
Author

theresa-cordero commented Mar 10, 2025

Thanks Bob for the feedback and the changes on you branch. I have decided to merge in your branch and made one more additional change by renaming btloop_setup_eta to btloop_eta_predictor.

Theresa Morrison and others added 6 commits March 10, 2025 13:36
The short description of each routine is:
btstep_ubt_from_layer: Calculate the zonal and meridional velocity from the 3-D velocity.
btstep_find_Cor: Find the Coriolis force terms _zon and _mer.
btloop_setup: Set ubt, vbt, eta, find face areas, and set valid array size at the beginning of each barotropic loop.
btloop_setup_eta: A routine to set eta_pred and the running time integral of uhbt and vhbt.
btloop_setup_OBCs: Setup old or previous ubt, vbt, vbt, and vhbt for different OBC options before the next step in the btstep loop.
btloop_update_v:  Update meridional velocity.
btloop_update_u: Update zonal velocity.
btstep_layer_accel:  Calculate the zonal and meridional acceleration of each layer due to the barotropic calculation.
  This commit builds upon the recent refactoring of MOM_barotropic.F90 by
correcting the openMP directives, dealing with use_old_coriolis_bracket_bug via
an optional argument to to btloop_update_v, simplifying what is done in
btstep_ubt_from_layer, restoring parentheses to btstep_layer_accel that are
needed for rotational symmetry with FMAs enabled and adding doxygen comments
describing several arguments.  All answers are bitwise identical.
  Replaced btloop_setup with truncate_velocities and moved the halo updates
that had been in btloop_setup out of this routine, making it more functionally
targeted.  Also replaced the index range arguments to btloop_update_u and
btloop_update_v to simply pass the ranges that are used rather than having to
reconstruct them inside these routines.

  Doxygen comments were added to describe several arguments to the newly added
routines, and several comments were also added describing what the various calls
within btstep do.  All answers are bitwise identical.
  Replace btloop_setup_OBCs with store_u_for_OBCS and store_v_for_OBCs for more
functional targeting and simpler routines.  Also because ubt_old and ubt_prev
were copies of the same thing, ubt_old is no longer used in btstep while
ubt_prev is set over a slightly larger loop range in store_u_for_OBCs, and
similarly for vbt_old and vbt_prev.  All answers are bitwise identical.
  Added doxygen comments to describe all individual arguments, rather than
documenting them in groups, which should correct the previous problems with
failing testing.  Also avoided using a separate variables wt_accel_n and
wt_accel2_n in btstep.  All answers are bitwise identical.
Change name of new routine btloop_setup_eta to btloop_eta_predictor.
@herrwang0
Copy link

This is more of a question than a comment.

I really like the idea of using btloop_update_[uv] to remove duplicate code. The rest of the new subroutines seem to be used only once. I agree grouping them together does make subroutine btstep more readable. But I wonder if we need to consider the overhead of calling these single-used subroutines (and the argument lists are quite chunky too)? Or the benefit of a clearer structure outweighs the overhead?

Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these changes refactoring the barotropic solver, which should make the it easier to understand.

In addition to closely inspecting these changes, to verify the correctness of this refactoring, I have re-implemented all of these changes myself (e.g., copying the previous working parts of the code into the new routines) and semi-independently come up with the same results as Theresa did. I am therefore confident that all of these changes are correct.

@yichengt900
Copy link

I can confirm that this refactor has passed all existing regression tests for the CEFI regional configurations.

@marshallward
Copy link
Member

marshallward commented Mar 10, 2025

But I wonder if we need to consider the overhead of calling these single-used subroutines (and the argument lists are quite chunky too)? Or the benefit of a clearer structure outweighs the overhead?

I tested this PR in double_gyre with ifort and -O3 optimization, and only btstep_layer_accel appears to be inlined. The other functions are being called normally and the stack pointer code is rather large, but it doesn't necessarily mean a lot of overhead.

The double_gyre output is a little too simple to trust, but it does not appear to be much overhead; the function is maybe 1-2% extra overhead than leaving the loops in-place.

@Hallberg-NOAA
Copy link
Member

Hallberg-NOAA commented Mar 10, 2025

In this case, I do not think that the added subroutines are likely to be a significant added computational burden. These routines are within the same file, so they should be good candidates for inlining. In addition, this PR is intended to be a preliminary step towards more extensive modifications that may add new capabilities (e.g., fusing the sea-ice with the ocean barotropic solver) or running on new platforms (e.g,. GPUs).

(I would trust Marshall's comment more than mine regarding the inlining; I was writing and hit send before seeing that Marshall had also responded.)

@herrwang0
Copy link

@marshallward @Hallberg-NOAA Thanks for the testing and explaining the context. This makes sense!

@Hallberg-NOAA
Copy link
Member

This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/26703.

@Hallberg-NOAA Hallberg-NOAA merged commit 0180d52 into NOAA-GFDL:dev/gfdl Mar 10, 2025
10 checks passed
@marshallward
Copy link
Member

marshallward commented Mar 12, 2025

I should have thought to provide this report:

   -> (1898,12) MOM_BAROTROPIC::BTLOOP_UPDATE_V (isz = 1595) (sz = 1690)         
      [[ Inlining would exceed -inline-max-total-size value (3099>2000) <3>]]    
   -> (1908,12) MOM_BAROTROPIC::BTLOOP_UPDATE_U (isz = 1437) (sz = 1530)         
      [[ Inlining would exceed -inline-max-total-size value (2941>2000) <3>]]                                                                                                          
   -> (1919,12) MOM_BAROTROPIC::BTLOOP_UPDATE_U (isz = 1437) (sz = 1530)         
      [[ Inlining would exceed -inline-max-total-size value (2941>2000) <3>]]    
   -> (1928,12) MOM_BAROTROPIC::BTLOOP_UPDATE_V (isz = 1595) (sz = 1690)         
      [[ Inlining would exceed -inline-max-total-size value (3099>2000) <3>]]

Compiler won't inline into a function larger than 2000 lines(? or something line-like...).

In other words, there is no problem with the new functions, but btstep is still a very large function!

This 2k line limit is tunable, but we should probably aim to refactor towards a number far away from this limit at some point in the future.

Hallberg-NOAA added a commit that referenced this pull request Apr 7, 2025
  Restore the parentheses to the expressions for PFv in btloop_update_v so that
the model will once again respect rotational symmetry when fused-multiply-adds
are enabled.  These parentheses were in the corresponding expressions until PR
#845 to dev/gfdl when they were inadvertently omitted.   This commit changes
answers (and restores rotational symmetry) when fused-multiply-adds are enabled.
Hallberg-NOAA added a commit that referenced this pull request Apr 19, 2025
  Restore the parentheses to the expressions for Cor_v in btloop_update_v so
that the model will once again respect rotational symmetry when
fused-multiply-adds are enabled.  These parentheses were in the corresponding
expressions until PR #845 to dev/gfdl when they were inadvertently omitted.
This commit changes answers (and restores rotational symmetry) when
fused-multiply-adds are enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Code cleanup with no changes in functionality or results
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants