Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cime_config/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -705,6 +705,7 @@
"PEM_Ln90.ne30pg2_ne30pg2.F2010-SCREAMv1.scream-spa_remap--scream-output-preset-4",
"ERS_Ln90.ne30pg2_ne30pg2.F2010-SCREAMv1.scream-small_kernels--scream-output-preset-5",
"ERP_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.scream-bfbhash--scream-output-preset-6",
"ERS_Ld1.ne30pg2_ne30pg2.F2010-SCREAMv1.scream-L128--scream-output-preset-4"
)
},

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,9 @@ class SHOCMacrophysics : public scream::AtmosphereProcess

// Dry static energy
shoc_s(i,k) = PF::calculate_dse(T_mid(i,k),z_mid(i,k),phis(i));

if (k+1 == nlev_packs) zi_grid(i,nlevi_v)[nlevi_p] = 0;
});
zi_grid(i,nlevi_v)[nlevi_p] = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing a team_barrier before this line would also work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm puzzled. Why was this line buggy? All threads would execute that line. Albeit being unnecessary (so I'm ok with the change), it would not be incorrect, since they all set the same value, and there was a barrier right after.

Copy link
Member Author

@ambrad ambrad Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. If the previous kernel were over nlevi rather than nlev, I could make an argument about memory consistency. But since it's over nlev, I can think of any explanation. Maybe this fix is actually working around a compiler-side issue.

The evidence for this specific line being the issue is that the _sfc quantities a few lines below this code block were all wrong. It's possible there's some other bug I'm not seeing that this fix handles. If so, that bug must affect these _sfc quantities.

I'll weaken my suggestion that other C++ devs study this PR since it may just be working around a compiler-side problem.

team.team_barrier();

const auto zt_grid_s = ekat::subview(zt_grid, i);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,8 @@ class AtmosphereProcess : public ekat::enable_shared_from_this<AtmosphereProcess

// For internal diagnostics and debugging.
void print_global_state_hash(const std::string& label, const bool in = true,
const bool out = true, const bool internal = true) const;
const bool out = true, const bool internal = true,
const Real* mem = nullptr, const int nmem = 0) const;
// For BFB tracking in production simulations.
void print_fast_global_state_hash(const std::string& label) const;

Expand Down
21 changes: 17 additions & 4 deletions components/eamxx/src/share/atm_process/atmosphere_process_hash.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -113,19 +113,32 @@ void hash (const std::list<FieldGroup>& fgs, HashType& accum) {

} // namespace anon

// (mem, nmem) describe an arbitrary device array. If non-0, the array will be
// hashed and reported as a fourth line.
void AtmosphereProcess
::print_global_state_hash (const std::string& label, const bool in, const bool out,
const bool internal) const {
static constexpr int nslot = 3;
const bool internal, const Real* mem, const int nmem) const {
static constexpr int nslot = 4;
HashType laccum[nslot] = {0};
hash(m_fields_in, laccum[0]);
hash(m_groups_in, laccum[0]);
hash(m_fields_out, laccum[1]);
hash(m_groups_out, laccum[1]);
hash(m_internal_fields, laccum[2]);
const bool hash_array = mem != nullptr;
if (hash_array) {
HashType accum = 0;
Kokkos::parallel_reduce(
Kokkos::RangePolicy<ExeSpace>(0, nmem),
KOKKOS_LAMBDA(const int i, HashType& accum) { bfbhash::hash(mem[i], accum); },
bfbhash::HashReducer<>(accum));
Kokkos::fence();
laccum[3] = accum;
}
HashType gaccum[nslot];
bfbhash::all_reduce_HashType(m_comm.mpi_comm(), laccum, gaccum, nslot);
const bool show[] = {in, out, internal};
const int nr = hash_array ? nslot : nslot-1;
bfbhash::all_reduce_HashType(m_comm.mpi_comm(), laccum, gaccum, nr);
const bool show[] = {in, out, internal, hash_array};
if (m_comm.am_i_root())
for (int i = 0; i < nslot; ++i)
if (show[i])
Expand Down