-
Notifications
You must be signed in to change notification settings - Fork 37
Adds bandwidth.yml playbook for NVIDIA nvbandwidth #834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,3 +16,6 @@ cuda_samples_programs: | |
| - bandwidthTest | ||
| # cuda_devices: # discovered from deviceQuery run | ||
| cuda_persistenced_state: started | ||
| # variables for nvbandwidth (for bandwidth.yml tasks run in cudatests.yml) | ||
| cuda_bandwidth_path: "/var/lib/{{ ansible_user }}/cuda_bandwidth" | ||
sjpb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| cuda_bandwidth_release_url: "https://github.com/NVIDIA/nvbandwidth/archive/refs/tags/v0.8.tar.gz" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add a separate |
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,56 @@ | ||||||
| --- | ||||||
| - name: Ensure cuda_bandwidth_path exists | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| ansible.builtin.file: | ||||||
| state: directory | ||||||
| path: "{{ cuda_bandwidth_path }}" | ||||||
| owner: "{{ ansible_user }}" | ||||||
| group: "{{ ansible_user }}" | ||||||
| mode: "0755" | ||||||
|
|
||||||
| - name: Download CUDA bandwith test release | ||||||
| ansible.builtin.unarchive: | ||||||
| remote_src: true | ||||||
| src: "{{ cuda_bandwidth_release_url }}" | ||||||
| dest: "{{ cuda_bandwidth_path }}" | ||||||
| owner: "{{ ansible_user }}" | ||||||
| group: "{{ ansible_user }}" | ||||||
| creates: "{{ cuda_bandwidth_path }}/nvbandwidth-0.8" | ||||||
|
|
||||||
| - name: Creates CUDA bandwidth test build directory | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you make the name: consistent with the name: on the first task please? |
||||||
| ansible.builtin.file: | ||||||
| state: directory | ||||||
| path: "{{ cuda_bandwidth_path }}/nvbandwidth-0.8/build" | ||||||
| mode: "0755" | ||||||
|
|
||||||
| - name: Ensure cudatests directory exists | ||||||
| ansible.builtin.file: | ||||||
| path: "{{ appliances_environment_root }}/cudatests" | ||||||
| state: directory | ||||||
| mode: '0755' | ||||||
| delegate_to: localhost | ||||||
|
|
||||||
| - name: Build CUDA bandwidth test | ||||||
| ansible.builtin.shell: | ||||||
| cmd: | | ||||||
| source /cvmfs/software.eessi.io/versions/2023.06/init/bash && | ||||||
| module load Boost/1.82.0-GCC-12.3.0 && | ||||||
| . /etc/profile.d/sh.local && cmake .. && | ||||||
| make -j {{ ansible_processor_vcpus }} | ||||||
| chdir: "{{ cuda_bandwidth_path }}/nvbandwidth-0.8/build" | ||||||
| creates: "{{ cuda_bandwidth_path }}/nvbandwidth-0.8/build/nvbandwidth" | ||||||
|
|
||||||
| - name: Run CUDA bandwidth test | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this needs |
||||||
| ansible.builtin.shell: | | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So:
Is it not sufficent to just activate eessi again? And maybe load some eeesi modules? |
||||||
| export LD_LIBRARY_PATH=/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/12.3.0/lib64:\ | ||||||
| /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/software/Boost/1.82.0-GCC-12.3.0/lib | ||||||
| ./nvbandwidth | ||||||
| args: | ||||||
| chdir: "{{ cuda_bandwidth_path }}/nvbandwidth-0.8/build/" | ||||||
| register: cuda_bandwidth_output | ||||||
|
|
||||||
| - name: Save CUDA bandwidth output to bandwidth_results.txt | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there no useful summary we can do here? |
||||||
| ansible.builtin.copy: | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given this is fetching a file, why does this not use |
||||||
| content: "{{ cuda_bandwidth_output.stdout }}" | ||||||
| dest: "{{ appliances_environment_root }}/cudatests/bandwidth_results.txt" | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When |
||||||
| mode: '0644' | ||||||
| delegate_to: localhost | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we don't even run devicequery, I think we should just remove this task entirely TBH. But leave the role pending thinking more!