-
Notifications
You must be signed in to change notification settings - Fork 231
Disable async MR priming by default #2051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Does the benchmark in this PR seem useful? The results I get locally are:
In summary: I think priming costs a lot at MR construction time and I don't think it has any significant performance impact on subsequent allocation times. I am not sure if this benchmark is really a fair measurement or not. I am going to run PDS-H workflows to get more data. |
Here are the PDS-H benchmarks with this PR. cc: @GregoryKimball Hardware: V100 32GB, CUDA 12.9, driver 575. Profiling command:
Note: Overall PerformanceResults for
branch-25.12 JSON results
Results for
async-priming JSON results
I ran both branches 3 times and got results within 1-2 seconds of the above on both branches. Any changes in workflow performance are well within the run-to-run noise. Investigating profilesHere is the ![]() First-touch accessOn
![]() On the
![]() I would say that the cost of first-touch access shouldn't be a problem, because we are skipping the more expensive priming call in the memory resource constructor, which doesn't appear to show any benefits. Overall we are doing less work with priming disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I have run the PDS-H benchmarks, I am less sure if adding this benchmark is really useful. Reviewers, please weigh in on whether you'd like to see this added or if it provides no value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved trivial CMake changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am neutral on the benchmark, but +1 on the priming change.
/merge |
Here is a visualization for the data @bdice posted above. It looks like this PDS-H run used query setting |
Summary: This PR skips the default "priming" step of the RMM async memory resource. This reduces initialization costs and has benefits for multi-process applications. xref: - rapidsai/rmm#2060 - rapidsai/rmm#1931 - rapidsai/rmm#2051 Pull Request resolved: #14997 Reviewed By: Yuhta Differential Revision: D83668107 Pulled By: kgpai fbshipit-source-id: 41b6bd5807f60b0e1c76a1c91dd26f7e5451255a
Description
Closes #1931
Checklist