-
Notifications
You must be signed in to change notification settings - Fork 435
EAMxx: assign columns to bins during intialization of zonal average #7481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cjvogl
wants to merge
13
commits into
master
Choose a base branch
from
improvement-eamxx-zonal_average_diagnostic
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+136
−75
Open
Changes from 12 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
58a293b
refactored zonal average diagnostic to assign columns to bins during …
cjvogl 7d60a28
removed unused include in zonal average diagnostic
cjvogl 9f91e95
adjusted zonal avg to account for column at northern pole (lat = 90)
cjvogl f7d09be
leverage bin-to-col map to scale zonal area in zonal average
cjvogl c62e730
updated zonal avg test to include columns at the poles
cjvogl 9588697
updated user docs for zonal average
cjvogl 6752c1c
updated zonal_avg test to support multiple MPI ranks
cjvogl 2bd930d
addressed CUDA error (errant use of KOKKOS_LAMBDA) and warning (impli…
cjvogl 604a1ba
addressing capture issues with CUDA and zonal average diag
cjvogl 4574fa7
fixed functor for determining max # of columns per zonal avg bin
cjvogl e3cb3e1
corrected some rebasing issue in zonal average
cjvogl d5bfa35
missing include after rebase
cjvogl 73dc4dc
corrected accidental use of std::max on device in zonal average diagn…
cjvogl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this loop is what is causing the CUDA errors (I'll confirm this once I have access to an appropriate machine).
The loop is a plain old for loop because the process of building lists of column indices is inherently serial (i.e., a parallel for loop would run into many race conditions). Looking through the kokkos documentation, I believe this could be made into a
parallel_for
with the use of eitherView
with anAtomic
memory traitAny thoughts on one versus the other is greatly appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably accomplish this through a
parallel_reduce()
, but I would need to take a closer look at the structure to confirm. I'll take a look at this Monday, and review the PR as a whole.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is done during initialization, I am not too concerned about this. That said, you could split the work into 3 kernels:
Between 2 and 3 you can also check that the max of bin_sizes is indeed less then max_ncols_per_bin. Your code may be attempting to access the view out of bounds if the max was not large enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not really make the code faster, but it removes the need for a max_ncols_per_bin, as you could create the diagnostic after step 2, once you know all the bins sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was more-or-less the issue: I was using
std::max
on device, which was resulting in the max being zero. This should be corrected now in 73dc4dcThis makes sense to me. The only reason I opted to keep the original approach is that I don't have experience using an array as an accumulator.