Skip to content

Conversation

rfhaque
Copy link
Collaborator

@rfhaque rfhaque commented Jun 16, 2025

Description

This PR updates the implementation of the gromacs experiment in benchpark.

  1. The supported version of gromacs has been updated to 2025.2 (latest available in the upstreamed spack packages).
  2. Support for rocm has been added
    • The preferred way to run gromacs on a rocm backend is to build it with AdaptiveCpp/HipSycl (-DGMX_GPU=SYCL -DGMX_SYCL=ACPP). However, this only works for amdclang compilers provided with a rocm installation. The hipsycl version being used has been updated to 24.10.0 (latest available in the upstreamed spack packages). With this groamcs now builds with rocm-6.2+ on tuolumne. The rocm build works with and without GPU-aware mpi (benchpark provides a way to enable/disable GPU-aware MPI)
    • Running gromacs on 2 nodes (8 ranks/8 gpus) of tuolumne hangs at the first step of execution that executes gmx_mpi grompp. Plainly running gmx_mpi grompp ... in a batch job script with 2 nodes seems to hang. If the command is predicated with the job scheduler e.g. flux run -n1 -N1 gmx_mpi grompp ... the code runs correctly. Currently ramble doesn't allows predicating different steps within the same experiment with different node/core counts. This has been fixed by adding a variable called single_rank_mpi_command to gromacs and overriding it in benchpark's allocation modifier.
    • We can also build with rocm 6+ on tuolumne without hipsycl, using the plain HIP backend introduced in gromacs 2025 (-DGMX_GPU=HIP). However at runtime we get a "Chosen GPU implementation does not support constraints" error.
  3. Building on lassen (POWER9+NVIDIA V100), requires gcc 11 and cuda 12.2.2 (this is the recommended gromacs configuration for POWER machines) The default clang compiler and cuda 11 do not work on lassen
  4. The code also builds and runs on matrix (x86+NVIDIA H100), using cuda 12 and gcc 12, however, cmake 3.28+ required for gromacs is not available on matrix. Building cmake in spack throws and error that is currently circumvented by manually adding additional config details in the generated ramble.yaml

The gromacs package.py in the benchpak repo is almost identical to the upstreamed version in spack, with only the rocm changes added. The hipsycl package.py is identical to the one in the spack develop branch. Once benchpark is updated to the latest spack, this package.py can be removed.
The application.py in the benchpak repo has been deleted, we now directly use the version upstreamed in ramble.
We use the same application.py as the upstreamed ramble version adding changes to support the single_rank_mpi_command command
All new compiler definitions required by gromacs on the target machines have been added to their respective system configs.

Adding/modifying a system (docs: Adding a System)

  • Add/modify systems/system_name/system.py file

Adding/modifying a benchmark (docs: Adding a Benchmark)

  • If package.py upstreamed to Spack is insufficient, add/modify repo/benchmark_name/package.py plus: create, self-assign, and link here a follow up issue with a link to the PR in the Spack repo.
  • Tags in Ramble's application.py or in repo/benchmark_name/application.py will appear in the docs catalogue
  • Add/modify an experiments/benchmark_name/experiment.py to define a single node and multi-node experiments

@github-actions github-actions bot added experiment New or modified experiment system New or modified system config application labels Jun 16, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jun 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 40.74%. Comparing base (ed3ae54) to head (d4c6f8d).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #835   +/-   ##
========================================
  Coverage    40.74%   40.74%           
========================================
  Files           15       15           
  Lines         1107     1107           
  Branches       152      152           
========================================
  Hits           451      451           
  Misses         620      620           
  Partials        36       36           
Flag Coverage Δ
unittests 40.74% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rfhaque rfhaque removed the request for review from scheibelp June 16, 2025 06:22
@pszi1ard
Copy link
Collaborator

Thanks for the update!

However, this only works for amdclang compilers provided with a rocm installation where llvm version < 18. On tuolumne (x86+AMD MI300), all available rocm 6+ versions are llvm 18.

We don't see issues with newer ROCm on our machines. What issues are you running into with llvm > 18? Is it a spack restriction or some other issue?

Also running gromacs on 2 nodes (8 ranks/8 gpus) of tuolumne hangs, running on a single node (4 ranks/4 gpus) works. The rocm build works with and without GPU-aware mpi.

Does it hang both with GPU-aware MPI and without?

@pearce8 pearce8 mentioned this pull request Jun 16, 2025
9 tasks
@rfhaque
Copy link
Collaborator Author

rfhaque commented Jun 16, 2025

Thanks for the update!

However, this only works for amdclang compilers provided with a rocm installation where llvm version < 18. On tuolumne (x86+AMD MI300), all available rocm 6+ versions are llvm 18.

We don't see issues with newer ROCm on our machines. What issues are you running into with llvm > 18? Is it a spack restriction or some other issue?

@pszi1ard gromacs now works on tuolumne with rocm-6.2.4. The hipsycl package.py had to be also updated to the latest upstreamed spack version.

Also running gromacs on 2 nodes (8 ranks/8 gpus) of tuolumne hangs, running on a single node (4 ranks/4 gpus) works. The rocm build works with and without GPU-aware mpi.

Does it hang both with GPU-aware MPI and without?

It hangs in both cases. I'm trying to figure out if this might have something to do with the tuolumne scheduler rather than gromacs itself.

@pszi1ard
Copy link
Collaborator

Thanks for the update!

However, this only works for amdclang compilers provided with a rocm installation where llvm version < 18. On tuolumne (x86+AMD MI300), all available rocm 6+ versions are llvm 18.

We don't see issues with newer ROCm on our machines. What issues are you running into with llvm > 18? Is it a spack restriction or some other issue?

@pszi1ard gromacs now works on tuolumne with rocm-6.2.4. The hipsycl package.py had to be also updated to the latest upstreamed spack version.

OK, good to know.

Also running gromacs on 2 nodes (8 ranks/8 gpus) of tuolumne hangs, running on a single node (4 ranks/4 gpus) works. The rocm build works with and without GPU-aware mpi.

Does it hang both with GPU-aware MPI and without?

It hangs in both cases. I'm trying to figure out if this might have something to do with the tuolumne scheduler rather than gromacs itself.

OK. Let me know if you need feedback on that. Just to confirm, do see the same issue on your MI250x machine?

@pszi1ard
Copy link
Collaborator

We can also build with rocm 6+ on tuolumne without hipsycl, using the plain HIP backend introduced in gromacs 2025 (-DGMX_GPU=HIP). However at runtime we get a "Chosen GPU implementation does not support constraints" error.

Note that this is incomplete and only supports basic offload of one force computation, no support for GPU-residet mode nor direct GPU communication.

@rfhaque
Copy link
Collaborator Author

rfhaque commented Jun 16, 2025

Thanks for the update!

However, this only works for amdclang compilers provided with a rocm installation where llvm version < 18. On tuolumne (x86+AMD MI300), all available rocm 6+ versions are llvm 18.

We don't see issues with newer ROCm on our machines. What issues are you running into with llvm > 18? Is it a spack restriction or some other issue?

@pszi1ard gromacs now works on tuolumne with rocm-6.2.4. The hipsycl package.py had to be also updated to the latest upstreamed spack version.

OK, good to know.

Also running gromacs on 2 nodes (8 ranks/8 gpus) of tuolumne hangs, running on a single node (4 ranks/4 gpus) works. The rocm build works with and without GPU-aware mpi.

Does it hang both with GPU-aware MPI and without?

It hangs in both cases. I'm trying to figure out if this might have something to do with the tuolumne scheduler rather than gromacs itself.

OK. Let me know if you need feedback on that. Just to confirm, do see the same issue on your MI250x machine?

@pszi1ard The problem seems to be with the first step in the experiment that executes gmx_mpi grompp. Plainly running gmx_mpi grompp ... in a batch job script with 2 nodes seems to hang. If the command is predicated with the scheduler flux run -n1 -N1 gmx_mpi grompp ... the code runs correctly. Is there a way to tell gmx_mpi to only run on a single node if no scheduler is specified? @pearce8 Currently I'm not sure if ramble allows predicating different steps within the same experiment with different node/core counts, but that is what is needed in this case. gmx_mpi is the MPI version of gromacs, but some steps like grompp need to be run on a single node.

@github-actions github-actions bot added the feature New feature or request label Jun 17, 2025
sbatch_directives = list(f"#SBATCH {x}" for x in (srun_opts + sbatch_opts))

v.mpi_command = f"srun {' '.join(srun_opts)}"
v.single_rank_mpi_command = "srun -n 1 -N 1"
Copy link
Collaborator Author

@rfhaque rfhaque Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp I've added a single_rank_mpi_command that is used for mpi executables meant to be run on a single node

Comment on lines +192 to +197
workload_variable(
"single_rank_mpi_command",
default="",
description="Job scheduler command for running on a single mpi rank",
workload_group="all_workloads",
)
Copy link
Collaborator Author

@rfhaque rfhaque Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp Correspondingly gromacs defines a single_rank_mpi_command variable that is set to an empty string by default. The allocation modifier above overrides this variable with the appropriate string. This setup ensures that the application.py works outside of benchpark and the allocation modifier

Comment on lines +211 to +216
workload_variable(
"grompp",
default="{single_rank_mpi_command} {gmx} grompp",
description="How to run grompp",
workload_group="all_workloads",
)
Copy link
Collaborator Author

@rfhaque rfhaque Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp The single_rank_mpi_command variable is prefixed to the gromacs command

depends_on("rocfft")
with when("+sycl"):
depends_on("hipsycl")
depends_on(f"hipsycl@24.10.0:", when=f"@2025:")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should 23.10 or newer, we don't require 24.10.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

23.10 requires llvm < 18 and that does not work with the rocm 6+ compilers on tuolumne.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a machine-specific limitation.

We should avoid hardcoding requirement that conflict with official docs, and limit usage of version combinations. The official requirement are:

  • with 2023 / 2024 >=0.9.4 (recommended 23.10)
  • 2025 >=23.10.

We could simplify a bit and require 23.10 for 2024 and 2025 noting that this is not fully in line with official docs.

I've used 23.10 with llvm >= 18.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been consistently getting that error on tuolumne, but let me try that again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rfhaque realted to hipSYCL / ACPP, there is an ACPP feature we collaborated on with their team called "instant submission". This provides major perf boost and it is available from ACPP >=23.10; it is enabled by default from 2025 but for earlier it needs to be enabled manually, see https://manual.gromacs.org/current/release-notes/2025/major/performance.html#instant-submission-mode-enabled-by-default-when-building-with-adaptivecpp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: I have reproduced the issues with ACPP 23.10 and llvm bundled with ROCm 6.2 and later. I am not sure if this is an issue with the ROCm llvm which is not a clean version and sometimes claims conflicting version vs features it supports. Have you been able to try ACPP with upstream llvm >=18?

We will need to update the support matrix of ACPP vs ROCm versions, I have some WIP tests I'll provide more details when I have them.

Comment on lines 536 to 539
if "force" in self.spec.variants["gpu-aware-mpi"].value:
env.set("GMX_FORCE_GPU_AWARE_MPI", "1")
elif "off" in self.spec.variants["gpu-aware-mpi"].value:
env.set("GMX_DISABLE_DIRECT_GPU_COMM", "1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the gpu-aware-mpi was added earlier, but I wonder why is this needed as a package variant? This is a purely runtime option, the environment variables are only used to guide runtime choices, i.e.

  • GMX_DISABLE_DIRECT_GPU_COMM to enable direct GPU communication (which, when built with MPI uses GPU-aware MPI)
  • GMX_FORCE_GPU_AWARE_MPI to tell mdrun that even if build-time GPU-aware capability was not possible to detect, assume that it should work (e.g. needed with Cray MPICH)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pszi1ard This benchpark variant (on, off, and force are the allowed values) sets the runtime variables GMX_DISABLE_DIRECT_GPU_COMM and GMX_FORCE_GPU_AWARE_MPI correctly in spack. gromacs cannot detect GPU-aware MPI on tuolumne, so I have to use the gpu-aware-mpi=force variant to enable GMX_FORCE_GPU_AWARE_MPI

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does spack need the environment variables set? These are only needed at runtime, not compile-time.

Tow comments:

  • the GROMACS nomenclature is "Direct GPU communication" rather than "GPU-aware MPI" as it affects not only MPI-based parallelization. Could we add a synonymous direct-gpu-comm variant?
  • before GROMACS v2025, even when support is detected direct GPU comm is not default, so GMX_ENABEL_DIRECT_GPU_COMM is needed to enable it; starting v2025 the default is changed and GMX_DISABLE_DIRECT_GPU_COMM is used to manually disable. I suggest adding corresponding logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does spack need the environment variables set? These are only needed at runtime, not compile-time.

Spack is used to set the variables at runtime. Specifically, we set these in the spack function setup_run_environment that is invoked by spack at runtime to set the environment variables.

Tow comments:

* the GROMACS nomenclature is "Direct GPU communication" rather than "GPU-aware MPI" as it affects not only MPI-based parallelization. Could we add a synonymous `direct-gpu-comm` variant?

I'll change the variant name.

* before GROMACS v2025, even when support is detected direct GPU comm is not default, so `GMX_ENABEL_DIRECT_GPU_COMM` is needed to enable it; starting v2025 the default is changed and `GMX_DISABLE_DIRECT_GPU_COMM` is used to manually disable. I suggest adding corresponding logic.

I had the pre-2025 logic (default disable) earlier, but I changed to conform to the new logic (default enable). Do we want to support both?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spack is used to set the variables at runtime. Specifically, we set these in the spack function setup_run_environment that is invoked by spack at runtime to set the environment variables.

Thanks, this still sounds confusing, I guess I need to read up how this works. I thought spack's role was build/deploy, but this sounds like runtime concerns are leaking in which seem counter-intuitive.

I had the pre-2025 logic (default disable) earlier, but I changed to conform to the new logic (default enable). Do we want to support both?

I think it is useful mainly for "reference" runs e.g. when GPU-aware MPI is suspected to be causing issues.

@pszi1ard
Copy link
Collaborator

is there a way to tell gmx_mpi to only run on a single node if no scheduler is specified? @pearce8 Currently I'm not sure if ramble allows predicating different steps within the same experiment with different node/core counts, but that is what is needed in this case. gmx_mpi is the MPI version of gromacs, but some steps like grompp need to be run on a single node.

@rfhaque I've no idea how to solve that, if you invoke a non-MPI program (e.g. gmx grompp) with an MPI wrapper you will just run N instances of the same program, so this is a general concern and it sounds like what @scheibelp (thanks!) is proposing is aimed to tackle that.

@rfhaque rfhaque requested a review from scheibelp June 17, 2025 08:07
@pearce8 pearce8 marked this pull request as ready for review June 17, 2025 11:14
@pszi1ard
Copy link
Collaborator

pszi1ard commented Jun 18, 2025

I ran into some build-time issues because the hipsycl/package.py requires llvm but llvm-amdgpu seems to not satisfy this. This leads to further issues when trying to use rocmcc as a compiler.

Looking into this, this brings up multiple ROCm/rocmcc/llvm-related concerns:

  • Ideally our default GROMACS recommendation is to use a single ROCm and its bundled llvm for both ACPP/hipsycl and GROMACS -- how can we make depends_on(llvm...) be satisfied by llvm-amdgpu?
  • While upstream llvm does work as the ACPP compiler library dependency, when we use rocmcc for GROMACS, we’d need to ensure we are not mixing llvm versions i.e. if rocmcc@6.3 is used llvm@18.x should be allowed as the ACPP dependency. Can that be implemented?
  • the current hipsycl/package.py expresses some llvm version dependencies -- these are pessimistic and represent only the limited subset that's been tested by upstream (e.g. v24.02 is known to work with llvm18 at least with those bundles with ROCM <=6.3); verified this withe the author, so some work (verification mainly) will be needed to relax the restrictions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
application experiment New or modified experiment feature New feature or request system New or modified system config
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants