From 3f80fee5ba5ddaf5b57d8c310ba1c590f1db3678 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Wed, 17 Sep 2025 13:31:13 -0700 Subject: [PATCH 01/23] Update system docs for spack1.0 --- docs/add-a-system-config-old.rst | 446 +++++++++++++++++++++++++++++++ docs/add-a-system-config.rst | 418 ++++------------------------- 2 files changed, 504 insertions(+), 360 deletions(-) create mode 100644 docs/add-a-system-config-old.rst diff --git a/docs/add-a-system-config-old.rst b/docs/add-a-system-config-old.rst new file mode 100644 index 000000000..2a4f3a800 --- /dev/null +++ b/docs/add-a-system-config-old.rst @@ -0,0 +1,446 @@ +.. + Copyright 2023 Lawrence Livermore National Security, LLC and other + Benchpark Project Developers. See the top-level COPYRIGHT file for details. + + SPDX-License-Identifier: Apache-2.0 + +Adding a System +=============== + +This guide is intended for those wanting to run a benchmark on a new system, such as +vendors, system administrators, or application developers. It assumes a system +specification does not already exist. + +System specifications include two types of information: + +1. Hardware specs in `hardware_description.yaml` (e.g., how many CPU cores the node has) +2. Software stack specs in `system.py` (e.g., installed compilers and libraries, along + with their locations and versions) + +.. + note: + Please replace the steps below with a flow diagram. + +To specify a new system: + +1. Identify a system in Benchpark with the same hardware. +2. If a system with the same hardware does not exist, add a new hardware description, as + described in Adding System Hardware Specs section. +3. Identify the same software stack description. Typically if the same hardware is + already used by Benchpark, the same software stack may already be specified if the + same vendor software stack is used on this hardware - or, if a software stack of your + datacenter is already specified. +4. If the same software stack description does not exist, determine if there is one that + can be parameterized to match yours. +5. If can't parameterize existing software description, add a new one. + +1. Adding System Hardware Specs +------------------------------- + +We list hardware descriptions of Systems specified in Benchpark in the System Catalogue +in :doc:`system-list`. + +If you are running on a system with an accelerator, find an existing system with the +same accelerator vendor, and then secondarily, if you can, match the actual accelerator. + +1. accelerator.vendor +2. accelerator.name + +Once you have found an existing system with a similar accelerator or if you do not have +an accelerator, match the following processor specs as closely as you can. + +1. processor.name +2. processor.ISA +3. processor.uArch + +For example, if your system has an NVIDIA A100 GPU and an Intel x86 Icelake CPUs, a +similar config would share the A100 GPU, and CPU architecture may or may not match. Or, +if I do not have GPUs and instead have SapphireRapids CPUs, the closest match would be +another system with x86_64, Xeon Platinum, SapphireRapids. + +If there is not an exact match, you may add a new directory in the +`systems/all_hardware_descriptions/system_name` where `system_name` follows the naming +convention: + +:: + + [INTEGRATOR]-MICROARCHITECTURE[-GPU][-NETWORK] + +where: + +:: + + INTEGRATOR = COMPANY[_PRODUCTNAME][...] + + MICROARCHITECTURE = CPU Microarchitecture + + GPU = GPU Product Name + + NETWORK = Network Product Name + +In the `systems/all_hardware_descriptions/system_name` directory, add a +`hardware_description.yaml` which follows the yaml format of existing +`hardware_description.yaml` files. + +2. Adding or Parameterizing System Software Stack +------------------------------------------------- + +``system.py`` in Benchpark provides an API to represent a system software stack as a +command line parameterizable object. If none of the available software stack +specifications match your system, you may add a `new-system` directory in the `systems` +directory where the `new-system` directory name follows the naming convention: + +:: + + SITE-SYSTEMNAME + +where: + +:: + + SITE = nosite | abbreviated datacenter name + + SYSTEMNAME = the name of the specific system + +.. + note: + make all these x86 example. Automate the directory structure? + +Next, copy the system.py from the system with the most similar software stack into +`new-system` directory, and update it to match your system. For example, the generic-x86 +system software stack is defined in: + +:: + + $benchpark + ├── systems + ├── generic-x86 + ├── system.py + +The System base class is defined in ``/lib/benchpark/system.py``, some or all of the +functions can be overridden to define custom system behavior. Your +``systems/{SYSTEM}/system.py`` should inherit from the System base class. + +The generic-x86 system subclass should run on most x86_64 systems, but we mostly provide +it as a starting point for modifying or testing. Potential common changes might be to +edit the scheduler or number of cores per node, adding a GPU configuration, or adding +other external compilers or packages. + +To make these changes, we provided an example below, where we start with the generic-x86 +system.py, and make a system called Modifiedx86. + +1. First, make a copy of the system.py file in generic_x86 folder and move it into a new + folder, e.g., ``systems/modified_x86/system.py``. Then, update the class name to + ``Modifiedx86``.: + + :: + + class Modifiedx86(System): + +2. Next, to match our new system, we change the scheduler to slurm and the number of + cores per node to 48, and number of GPUs per node to 2.: + + :: + + # this sets basic attributes of our system + def __init__(self, spec): + super().__init__(spec) + self.scheduler = "slurm" + self.sys_cores_per_node = "48" + self.sys_gpus_per_node = "2" + +3. Let's say the new system's GPUs are NVIDIA, we can add a variant that allows us to + specify the version of CUDA we want to use, and the location of those CUDA + installations on our system. We then add the spack package configuration for our CUDA + installations into the `compute_packages_section`.: + + :: + + # import the variant feature at the top of your system.py + from benchpark.directives import variant + + # this allows us to specify which cuda version we want as a command line parameter + variant( + "cuda", + default="11-8-0", + values=("11-8-0", "10-1-243"), + description="CUDA version", + ) + + # set this to pass to spack + def system_specific_variables(self): + return {"cuda_arch": "70"} + + # define the external package locations + def compute_packages_section(self): + selections = { + "packages": { + "elfutils": { + "externals": [{"spec": "elfutils@0.176", "prefix": "/usr"}], + "buildable": False, + }, + "papi": { + "buildable": False, + "externals": [{"spec": "papi@5.2.0.0", "prefix": "/usr"}], + }, + } + } + if self.spec.satisfies("cuda=10-1-243"): + selections["packages"] |= { + "cusparse": { + "externals": [ + { + "spec": "cusparse@10.1.243", + "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", + } + ], + "buildable": False, + }, + "cuda": { + "externals": [ + { + "spec": "cuda@10.1.243+allow-unsupported-compilers", + "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", + } + ], + "buildable": False, + }, + } + elif self.spec.satisfies("cuda=11-8-0"): + selections["packages"] |= { + "cusparse": { + "externals": [ + { + "spec": "cusparse@11.8.0", + "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", + } + ], + "buildable": False, + }, + "cuda": { + "externals": [ + { + "spec": "cuda@11.8.0+allow-unsupported-compilers", + "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", + } + ], + "buildable": False, + }, + } + + return selections + +External packages can be found via `benchpark system external ---new-system +{mysite}-{mysystem}`. Note, if your externals are *not* installed via Spack, read `Spack +documentation on modules +`_. + +4. Next, add any of the packages that can be managed by spack, such as blas/cublas +pointing to the correct version, this will generate the software configurations for +spack (``software.yaml``). The actual version will be rendered by Ramble when it is +built. + +:: + + def compute_software_section(self): + return { + "software": { + "packages": { + "default-compiler": {"pkg_spec": "gcc"}, + "compiler-gcc": {"pkg_spec": "gcc"}, + "default-mpi": {"pkg_spec": "openmpi"}, + "blas": {"pkg_spec": "openblas"}, + "lapack": {"pkg_spec": "openblas"}, + } + } + } + +5. The full system.py class for the modified_x86 system should now look like: + +:: + + import pathlib + + from benchpark.directives import variant + from benchpark.system import System + + class Modifiedx86(System): + + variant( + "cuda", + default="11-8-0", + values=("11-8-0", "10-1-243"), + description="CUDA version", + ) + + def __init__(self): + super().__init__() + + self.scheduler = "slurm" + setattr(self, "sys_cores_per_node", 48) + self.sys_gpus_per_node = "2" + + def system_specific_variables(self): + return {"cuda_arch": "70"} + + def compute_packages_section(self): + selections = { + "packages": { + "elfutils": { + "externals": [{"spec": "elfutils@0.176", "prefix": "/usr"}], + "buildable": False, + }, + "papi": { + "buildable": False, + "externals": [{"spec": "papi@5.2.0.0", "prefix": "/usr"}], + }, + } + } + if self.spec.satisfies("cuda=10-1-243"): + selections["packages"] |= { + "cusparse": { + "externals": [ + { + "spec": "cusparse@10.1.243", + "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", + } + ], + "buildable": False, + }, + "cuda": { + "externals": [ + { + "spec": "cuda@10.1.243+allow-unsupported-compilers", + "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", + } + ], + "buildable": False, + }, + } + elif self.spec.satisfies("cuda=11-8-0"): + selections["packages"] |= { + "cusparse": { + "externals": [ + { + "spec": "cusparse@11.8.0", + "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", + } + ], + "buildable": False, + }, + "cuda": { + "externals": [ + { + "spec": "cuda@11.8.0+allow-unsupported-compilers", + "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", + } + ], + "buildable": False, + }, + } + + return selections + + def compute_software_section(self): + return { + "software": { + "packages": { + "default-compiler": {"pkg_spec": "gcc"}, + "compiler-gcc": {"pkg_spec": "gcc"}, + "default-mpi": {"pkg_spec": "openmpi"}, + "blas": {"pkg_spec": "openblas"}, + "lapack": {"pkg_spec": "openblas"}, + } + } + } + """ + +Once the modified system subclass is written, run: ``benchpark system init +--dest=modifiedx86-system modifiedx86`` + +This will generate the required yaml configurations for your system and you can validate +it works with a static experiment test. + +.. note:: + + Use the ``benchpark info system {system_name}`` to find additional variants that are + available to all systems. This includes settings such as: the job timeout, + submitting to a different partition/queue, and setting the account/bank. + +3. Validating the System +------------------------ + +To manually validate your new system, you should initialize it and run an existing +experiment such as saxpy. For example: + +:: + + benchpark system init --dest=modifiedx86-system modifiedx86 + benchpark experiment init --dest=saxpy --system=modifiedx86-system saxpy +openmp + benchpark setup ./saxpy workspace/ + +Then you can run the commands provided by the output, the experiments should be built +and run successfully without any errors. + +The following yaml files are examples of what is generated for the modified_x86 system +from the example after it is initialized: + +.. note:: + + The following files are generated by benchpark (in the system destination folder) + and do not have to be manually created. + +1. ``system_id.yaml`` describes the system hardware, including the integrator (and the + name of the product node or cluster type), the processor, (optionally) the + accelerator, and the network; the information included here is what you will + typically see recorded about the system on Top500.org. We intend to make the system + definitions in Benchpark searchable, and will add a schema to enforce consistency; + until then, please copy the file and fill out all of the fields without changing the + keys. Also listed is the specific system the config was developed and tested on, as + well as the known systems with the same hardware so that the users of those systems + can find this system specification. + +.. code-block:: yaml + + system: + name: Modifiedx86 + spec: sysbuiltin.modifiedx86 cuda=11-8-0 + config-hash: 5310ebe8b2c841108e5da854c75dab931f5397a7fb41726902bb8a51ffb84a36 + +2. ``software.yaml`` defines default compiler and package names your package manager +(Spack) should use to build the benchmarks on this system. ``software.yaml`` becomes the +spack section in the `Ramble configuration file +`_. + +.. code-block:: yaml + + software: + packages: + default-compiler: + pkg_spec: 'gcc' + compiler-gcc: + pkg_spec: 'gcc' + default-mpi: + pkg_spec: 'openmpi' + blas: + pkg_spec: cublas@{default_cuda_version} + cublas-cuda: + pkg_spec: cublas@{default_cuda_version} + +3. ``variables.yaml`` defines system-specific launcher and job scheduler. + +.. code-block:: yaml + + variables: + timeout: "120" + scheduler: "slurm" + sys_cores_per_node: "48" + sys_gpus_per_node: 2 + cuda_arch: 70 + n_ranks: 18446744073709551615 # placeholder value + n_nodes: 18446744073709551615 # placeholder value + batch_submit: "placeholder" + mpi_command: "placeholder" + +Once you can run an experiment successfully, and the yaml looks correct, the new system +has been validated and you can continue your :doc:`benchpark-workflow`. diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index 2a4f3a800..5a6311478 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -7,32 +7,33 @@ Adding a System =============== -This guide is intended for those wanting to run a benchmark on a new system, such as -vendors, system administrators, or application developers. It assumes a system -specification does not already exist. - -System specifications include two types of information: - -1. Hardware specs in `hardware_description.yaml` (e.g., how many CPU cores the node has) -2. Software stack specs in `system.py` (e.g., installed compilers and libraries, along - with their locations and versions) +This guide is intended for those who would like to add a new system to benchpark, such +as vendors, system administrators, or application developers. Benchpark provides an API +for representing system specifications as objects and options to customize the +specification on the command line. System specifications are defined in ``system.py`` +files located in the systems directory: ``benchpark/systems//``. .. note: Please replace the steps below with a flow diagram. -To specify a new system: +To determine if you need to create a new system: -1. Identify a system in Benchpark with the same hardware. +1. Identify a system in Benchpark with the same hardware. See :doc:`system-list` to see + hardware descriptions for all available benchpark systems. 2. If a system with the same hardware does not exist, add a new hardware description, as - described in Adding System Hardware Specs section. + described in :ref:`adding-system-hardware-specs`. 3. Identify the same software stack description. Typically if the same hardware is already used by Benchpark, the same software stack may already be specified if the same vendor software stack is used on this hardware - or, if a software stack of your - datacenter is already specified. + datacenter is already specified. If a system exists with the same software stack, add + your system to that ``system.py`` as a value under the ``cluster`` variant, and + specify your systems specific resource configuration under the ``id_to_resources`` + dictionary. 4. If the same software stack description does not exist, determine if there is one that - can be parameterized to match yours. -5. If can't parameterize existing software description, add a new one. + can be parameterized to match yours, otherwise proceed with adding a new system. + +.. _adding-system-hardware-specs: 1. Adding System Hardware Specs ------------------------------- @@ -45,6 +46,8 @@ same accelerator vendor, and then secondarily, if you can, match the actual acce 1. accelerator.vendor 2. accelerator.name +3. accelerator.ISA +4. accelerator.uArch Once you have found an existing system with a similar accelerator or if you do not have an accelerator, match the following processor specs as closely as you can. @@ -52,6 +55,12 @@ an accelerator, match the following processor specs as closely as you can. 1. processor.name 2. processor.ISA 3. processor.uArch +4. processor.vendor + +And add the interconnect vendor and product name. + +1. interconnect.vendor +2. interconnect.name For example, if your system has an NVIDIA A100 GPU and an Intel x86 Icelake CPUs, a similar config would share the A100 GPU, and CPU architecture may or may not match. Or, @@ -82,365 +91,54 @@ In the `systems/all_hardware_descriptions/system_name` directory, add a `hardware_description.yaml` which follows the yaml format of existing `hardware_description.yaml` files. -2. Adding or Parameterizing System Software Stack +1. Creating the System class ------------------------------------------------- -``system.py`` in Benchpark provides an API to represent a system software stack as a -command line parameterizable object. If none of the available software stack -specifications match your system, you may add a `new-system` directory in the `systems` -directory where the `new-system` directory name follows the naming convention: - -:: +In this example, we will recreate the AWS ``system.py`` that we use for benchpark tutorials. At minimum, we import the base benchpark ``System`` class, which our ``AwsTutorial`` system will inherit from. We also import the maintainer and variant directives, which provide the utilities to track a maintainer by their GitHub username and variants to specify configurable properties of our system. We use ``instance_type`` instead of ``cluster`` (you will see ``cluster`` in other systems), because ``instance_type`` is more fitting in the context of AWS. - SITE-SYSTEMNAME - -where: +We configure the :: - SITE = nosite | abbreviated datacenter name - - SYSTEMNAME = the name of the specific system - -.. - note: - make all these x86 example. Automate the directory structure? - -Next, copy the system.py from the system with the most similar software stack into -`new-system` directory, and update it to match your system. For example, the generic-x86 -system software stack is defined in: - -:: + from benchpark.directives import maintainers, variant + from benchpark.paths import hardware_descriptions + from benchpark.system import System - $benchpark - ├── systems - ├── generic-x86 - ├── system.py - -The System base class is defined in ``/lib/benchpark/system.py``, some or all of the -functions can be overridden to define custom system behavior. Your -``systems/{SYSTEM}/system.py`` should inherit from the System base class. - -The generic-x86 system subclass should run on most x86_64 systems, but we mostly provide -it as a starting point for modifying or testing. Potential common changes might be to -edit the scheduler or number of cores per node, adding a GPU configuration, or adding -other external compilers or packages. - -To make these changes, we provided an example below, where we start with the generic-x86 -system.py, and make a system called Modifiedx86. - -1. First, make a copy of the system.py file in generic_x86 folder and move it into a new - folder, e.g., ``systems/modified_x86/system.py``. Then, update the class name to - ``Modifiedx86``.: - - :: - - class Modifiedx86(System): - -2. Next, to match our new system, we change the scheduler to slurm and the number of - cores per node to 48, and number of GPUs per node to 2.: - - :: - - # this sets basic attributes of our system - def __init__(self, spec): - super().__init__(spec) - self.scheduler = "slurm" - self.sys_cores_per_node = "48" - self.sys_gpus_per_node = "2" - -3. Let's say the new system's GPUs are NVIDIA, we can add a variant that allows us to - specify the version of CUDA we want to use, and the location of those CUDA - installations on our system. We then add the spack package configuration for our CUDA - installations into the `compute_packages_section`.: - - :: - - # import the variant feature at the top of your system.py - from benchpark.directives import variant - - # this allows us to specify which cuda version we want as a command line parameter - variant( - "cuda", - default="11-8-0", - values=("11-8-0", "10-1-243"), - description="CUDA version", - ) - - # set this to pass to spack - def system_specific_variables(self): - return {"cuda_arch": "70"} - - # define the external package locations - def compute_packages_section(self): - selections = { - "packages": { - "elfutils": { - "externals": [{"spec": "elfutils@0.176", "prefix": "/usr"}], - "buildable": False, - }, - "papi": { - "buildable": False, - "externals": [{"spec": "papi@5.2.0.0", "prefix": "/usr"}], - }, - } - } - if self.spec.satisfies("cuda=10-1-243"): - selections["packages"] |= { - "cusparse": { - "externals": [ - { - "spec": "cusparse@10.1.243", - "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", - } - ], - "buildable": False, - }, - "cuda": { - "externals": [ - { - "spec": "cuda@10.1.243+allow-unsupported-compilers", - "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", - } - ], - "buildable": False, - }, - } - elif self.spec.satisfies("cuda=11-8-0"): - selections["packages"] |= { - "cusparse": { - "externals": [ - { - "spec": "cusparse@11.8.0", - "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", - } - ], - "buildable": False, - }, - "cuda": { - "externals": [ - { - "spec": "cuda@11.8.0+allow-unsupported-compilers", - "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", - } - ], - "buildable": False, - }, - } - - return selections - -External packages can be found via `benchpark system external ---new-system -{mysite}-{mysystem}`. Note, if your externals are *not* installed via Spack, read `Spack -documentation on modules -`_. - -4. Next, add any of the packages that can be managed by spack, such as blas/cublas -pointing to the correct version, this will generate the software configurations for -spack (``software.yaml``). The actual version will be rendered by Ramble when it is -built. -:: + class AwsTutorial(System): + maintainers("michaelmckinsey1") - def compute_software_section(self): - return { - "software": { - "packages": { - "default-compiler": {"pkg_spec": "gcc"}, - "compiler-gcc": {"pkg_spec": "gcc"}, - "default-mpi": {"pkg_spec": "openmpi"}, - "blas": {"pkg_spec": "openblas"}, - "lapack": {"pkg_spec": "openblas"}, - } - } - } - -5. The full system.py class for the modified_x86 system should now look like: + id_to_resources = { + "c7i.12xlarge": { + "system_site": "aws", + "sys_cores_per_node": 48, + "sys_mem_per_node_GB": 96, + "hardware_key": str(hardware_descriptions) + + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", + }, + } -:: + variant( + "instance_type", + values="c7i.12xlarge", + default="c7i.12xlarge", + description="AWS instance type", + ) - import pathlib +2. Specify the class initializer +----------------------------- +__init__ - from benchpark.directives import variant - from benchpark.system import System +3. Add a packages section +---------------------- - class Modifiedx86(System): - variant( - "cuda", - default="11-8-0", - values=("11-8-0", "10-1-243"), - description="CUDA version", - ) +4. Add a compilers section +----------------------- - def __init__(self): - super().__init__() - - self.scheduler = "slurm" - setattr(self, "sys_cores_per_node", 48) - self.sys_gpus_per_node = "2" - - def system_specific_variables(self): - return {"cuda_arch": "70"} - - def compute_packages_section(self): - selections = { - "packages": { - "elfutils": { - "externals": [{"spec": "elfutils@0.176", "prefix": "/usr"}], - "buildable": False, - }, - "papi": { - "buildable": False, - "externals": [{"spec": "papi@5.2.0.0", "prefix": "/usr"}], - }, - } - } - if self.spec.satisfies("cuda=10-1-243"): - selections["packages"] |= { - "cusparse": { - "externals": [ - { - "spec": "cusparse@10.1.243", - "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", - } - ], - "buildable": False, - }, - "cuda": { - "externals": [ - { - "spec": "cuda@10.1.243+allow-unsupported-compilers", - "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", - } - ], - "buildable": False, - }, - } - elif self.spec.satisfies("cuda=11-8-0"): - selections["packages"] |= { - "cusparse": { - "externals": [ - { - "spec": "cusparse@11.8.0", - "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", - } - ], - "buildable": False, - }, - "cuda": { - "externals": [ - { - "spec": "cuda@11.8.0+allow-unsupported-compilers", - "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", - } - ], - "buildable": False, - }, - } - - return selections - - def compute_software_section(self): - return { - "software": { - "packages": { - "default-compiler": {"pkg_spec": "gcc"}, - "compiler-gcc": {"pkg_spec": "gcc"}, - "default-mpi": {"pkg_spec": "openmpi"}, - "blas": {"pkg_spec": "openblas"}, - "lapack": {"pkg_spec": "openblas"}, - } - } - } - """ - -Once the modified system subclass is written, run: ``benchpark system init ---dest=modifiedx86-system modifiedx86`` - -This will generate the required yaml configurations for your system and you can validate -it works with a static experiment test. - -.. note:: - - Use the ``benchpark info system {system_name}`` to find additional variants that are - available to all systems. This includes settings such as: the job timeout, - submitting to a different partition/queue, and setting the account/bank. - -3. Validating the System ------------------------- - -To manually validate your new system, you should initialize it and run an existing -experiment such as saxpy. For example: +5. Add a software section +----------------------- -:: - benchpark system init --dest=modifiedx86-system modifiedx86 - benchpark experiment init --dest=saxpy --system=modifiedx86-system saxpy +openmp - benchpark setup ./saxpy workspace/ - -Then you can run the commands provided by the output, the experiments should be built -and run successfully without any errors. - -The following yaml files are examples of what is generated for the modified_x86 system -from the example after it is initialized: - -.. note:: - - The following files are generated by benchpark (in the system destination folder) - and do not have to be manually created. - -1. ``system_id.yaml`` describes the system hardware, including the integrator (and the - name of the product node or cluster type), the processor, (optionally) the - accelerator, and the network; the information included here is what you will - typically see recorded about the system on Top500.org. We intend to make the system - definitions in Benchpark searchable, and will add a schema to enforce consistency; - until then, please copy the file and fill out all of the fields without changing the - keys. Also listed is the specific system the config was developed and tested on, as - well as the known systems with the same hardware so that the users of those systems - can find this system specification. - -.. code-block:: yaml - - system: - name: Modifiedx86 - spec: sysbuiltin.modifiedx86 cuda=11-8-0 - config-hash: 5310ebe8b2c841108e5da854c75dab931f5397a7fb41726902bb8a51ffb84a36 - -2. ``software.yaml`` defines default compiler and package names your package manager -(Spack) should use to build the benchmarks on this system. ``software.yaml`` becomes the -spack section in the `Ramble configuration file -`_. - -.. code-block:: yaml - - software: - packages: - default-compiler: - pkg_spec: 'gcc' - compiler-gcc: - pkg_spec: 'gcc' - default-mpi: - pkg_spec: 'openmpi' - blas: - pkg_spec: cublas@{default_cuda_version} - cublas-cuda: - pkg_spec: cublas@{default_cuda_version} - -3. ``variables.yaml`` defines system-specific launcher and job scheduler. - -.. code-block:: yaml - - variables: - timeout: "120" - scheduler: "slurm" - sys_cores_per_node: "48" - sys_gpus_per_node: 2 - cuda_arch: 70 - n_ranks: 18446744073709551615 # placeholder value - n_nodes: 18446744073709551615 # placeholder value - batch_submit: "placeholder" - mpi_command: "placeholder" - -Once you can run an experiment successfully, and the yaml looks correct, the new system -has been validated and you can continue your :doc:`benchpark-workflow`. +6. Validating the System +------------------------ \ No newline at end of file From 9ca5ed97f44986bbf1e2f3d0ac289e90daf8d0ab Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Mon, 22 Sep 2025 10:47:39 -0700 Subject: [PATCH 02/23] refactor resources dict --- systems/aws-tutorial/system.py | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/systems/aws-tutorial/system.py b/systems/aws-tutorial/system.py index 814344e8b..88df1c061 100644 --- a/systems/aws-tutorial/system.py +++ b/systems/aws-tutorial/system.py @@ -15,41 +15,39 @@ class AwsTutorial(System): maintainers("stephanielam3211") + common = { + "system_site": "aws", + "programming_models": [OpenMPCPUOnlySystem()], + "scheduler": "flux", + "hardware_key": str(hardware_descriptions) + + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", + } + id_to_resources = { "c7i.48xlarge": { - "system_site": "aws", + **common, "sys_cores_per_node": 192, "sys_mem_per_node_GB": 384, - "hardware_key": str(hardware_descriptions) - + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", }, "c7i.metal-48xl": { - "system_site": "aws", + **common, "sys_cores_per_node": 192, "sys_mem_per_node_GB": 384, - "hardware_key": str(hardware_descriptions) - + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", }, "c7i.24xlarge": { - "system_site": "aws", + **common, "sys_cores_per_node": 96, "sys_mem_per_node_GB": 192, - "hardware_key": str(hardware_descriptions) - + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", }, "c7i.metal-24xl": { - "system_site": "aws", + **common, "sys_cores_per_node": 96, "sys_mem_per_node_GB": 192, - "hardware_key": str(hardware_descriptions) - + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", }, "c7i.12xlarge": { - "system_site": "aws", + **common, "sys_cores_per_node": 48, "sys_mem_per_node_GB": 96, - "hardware_key": str(hardware_descriptions) - + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", }, } @@ -68,9 +66,7 @@ class AwsTutorial(System): def __init__(self, spec): super().__init__(spec) - self.programming_models = [OpenMPCPUOnlySystem()] - self.scheduler = "flux" attrs = self.id_to_resources.get(self.spec.variants["instance_type"][0]) for k, v in attrs.items(): setattr(self, k, v) From 26e46d31bfc302f007ec9a73266d5edc73b66fd5 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Mon, 22 Sep 2025 14:31:09 -0700 Subject: [PATCH 03/23] Fix expected path for spack packages for benchpark system external --- lib/benchpark/cmd/system.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/benchpark/cmd/system.py b/lib/benchpark/cmd/system.py index 0868bff42..1b2aa1b21 100644 --- a/lib/benchpark/cmd/system.py +++ b/lib/benchpark/cmd/system.py @@ -71,7 +71,7 @@ def system_external(args): ) with open( - benchpark.paths.benchpark_home / "../.spack/packages.yaml", "r" + benchpark.paths.benchpark_home / "spack/etc/spack/packages.yaml", "r" ) as file: new_packages = yaml.safe_load(file)["packages"] @@ -94,7 +94,7 @@ def system_external(args): + [pkg for pkg in pkg_list] ) - with open(benchpark.paths.benchpark_home / "../.spack/packages.yaml", "r") as file: + with open(benchpark.paths.benchpark_home / "spack/etc/spack/packages.yaml", "r") as file: new_packages = yaml.safe_load(file)["packages"] # Use DeepDiff to find differences From fb845d1e55b61b75f4b943b8a59cbd3b1917ea44 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Mon, 22 Sep 2025 14:31:36 -0700 Subject: [PATCH 04/23] Remove vestigal defintions --- systems/aws-tutorial/system.py | 4 ---- 1 file changed, 4 deletions(-) diff --git a/systems/aws-tutorial/system.py b/systems/aws-tutorial/system.py index 88df1c061..4f4164521 100644 --- a/systems/aws-tutorial/system.py +++ b/systems/aws-tutorial/system.py @@ -218,10 +218,6 @@ def compute_software_section(self): "software": { "packages": { "default-compiler": {"pkg_spec": "gcc@11.4.0"}, - "default-mpi": {"pkg_spec": "openmpi@4.0%gcc@11.4.0"}, - "compiler-gcc": {"pkg_spec": "gcc@11.4.0"}, - "lapack": {"pkg_spec": "lapack@0.29.2"}, - "mpi-gcc": {"pkg_spec": "openmpi@4.0%gcc@11.4.0"}, } } } From 2dc8ac7ecc7e993605d705bc332888c1a8f9c394 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Mon, 22 Sep 2025 14:32:45 -0700 Subject: [PATCH 05/23] Update docs --- docs/add-a-system-config.rst | 446 ++++++++++++++++++++++++++++++-- docs/update-a-system-config.rst | 70 ----- 2 files changed, 421 insertions(+), 95 deletions(-) delete mode 100644 docs/update-a-system-config.rst diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index 5a6311478..ca26dc7f9 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -35,7 +35,7 @@ To determine if you need to create a new system: .. _adding-system-hardware-specs: -1. Adding System Hardware Specs +A. Adding System Hardware Specs ------------------------------- We list hardware descriptions of Systems specified in Benchpark in the System Catalogue @@ -44,23 +44,23 @@ in :doc:`system-list`. If you are running on a system with an accelerator, find an existing system with the same accelerator vendor, and then secondarily, if you can, match the actual accelerator. -1. accelerator.vendor -2. accelerator.name -3. accelerator.ISA -4. accelerator.uArch +1. ``accelerator.vendor`` - TODO +2. ``accelerator.name`` - +3. ``accelerator.ISA`` - +4. ``accelerator.uArch`` - Once you have found an existing system with a similar accelerator or if you do not have an accelerator, match the following processor specs as closely as you can. -1. processor.name -2. processor.ISA -3. processor.uArch -4. processor.vendor +1. ``processor.name`` - +2. ``processor.ISA`` - +3. ``processor.uArch`` - +4. ``processor.vendor`` - And add the interconnect vendor and product name. -1. interconnect.vendor -2. interconnect.name +1. ``interconnect.vendor`` - +2. ``interconnect.name`` - For example, if your system has an NVIDIA A100 GPU and an Intel x86 Icelake CPUs, a similar config would share the A100 GPU, and CPU architecture may or may not match. Or, @@ -91,16 +91,75 @@ In the `systems/all_hardware_descriptions/system_name` directory, add a `hardware_description.yaml` which follows the yaml format of existing `hardware_description.yaml` files. -1. Creating the System class +B. Creating the System Definition (``system.py``) ------------------------------------------------- -In this example, we will recreate the AWS ``system.py`` that we use for benchpark tutorials. At minimum, we import the base benchpark ``System`` class, which our ``AwsTutorial`` system will inherit from. We also import the maintainer and variant directives, which provide the utilities to track a maintainer by their GitHub username and variants to specify configurable properties of our system. We use ``instance_type`` instead of ``cluster`` (you will see ``cluster`` in other systems), because ``instance_type`` is more fitting in the context of AWS. +Now that you have defined the hardware description for your system, you can now create +the ``system.py``, which involves defining the software on your system. This includes +defining compilers and pre-installed packages, which your package manager can use +instead of attempting to build the package from scratch. If using Spack, defining as +many external packages as possible here will ensure a much faster build process, and +using system-installed packages will likely always be more performant than building them +from scratch. + +1. Creating the System class +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In this example, we will recreate a fully-functional simplified example of the AWS +``system.py`` that we use for benchpark tutorials (see `aws-tutorial/system.py +`_). To +start, we import the base benchpark ``System`` class, which our ``AwsTutorial`` system +will inherit from. We also import the maintainer and variant directives, which provide +the utilities to track a maintainer by their GitHub username and variants to specify +configurable properties of our system. We can specify the different AWS instances that +share this same hardware and software specificaiton using the ``instance_type`` variant. +We use ``instance_type`` here instead of ``cluster`` (you will see ``cluster`` in other +systems), because ``instance_type`` is more fitting in this context. + +:: + + from benchpark.directives import maintainers, variant + from benchpark.system import System + + + class AwsTutorial(System): + maintainers("michaelmckinsey1") + + variant( + "instance_type", + values=("c7i.12xlarge", c7i.24xlarge), + default="c7i.12xlarge", + description="AWS instance type", + ) -We configure the +2. Specify the class initializer and resources +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When defining ``__init__()`` for our system, we invoke the parent +``System::__init__()``, and set important system attributes using the +``id_to_resources`` dictionary, which contains information for each ``cluster`` or +``instance_type``. We can optionally refactor common attributes for all +``instance_type``'s into a separate dictionary, for readability: + +1. ``system_site`` - The name of the site where the ``cluster``/``instance_type`` is + located. +2. ``programming_models`` - List of applicable programming models. ``MPI`` is assumed + for every system in benchpark, so you do not need to add it here. For this system, we + add ``OpenMPCPUOnlySystem`` (different from GPU openmp). If we had Nvidia + accelerators, we would add ``CudaSystem`` to this list, and ``ROCmSystem`` for AMD. +3. ``scheduler`` - The job scheduler. +4. ``hardware_key``, which defines a path to the yaml description you just created in + the previous step. +5. ``sys_cores_per_node`` - The amount of hardware cores per node. +6. ``sys_mem_per_node_GB`` - The amount of node memory, in gigabytes. + +This information is used to determine the necessary resource allocation request for any +experiment initialized with your chosen. :: from benchpark.directives import maintainers, variant + from benchpark.openmpsystem import OpenMPCPUOnlySystem from benchpark.paths import hardware_descriptions from benchpark.system import System @@ -108,37 +167,374 @@ We configure the class AwsTutorial(System): maintainers("michaelmckinsey1") + common = { + "system_site": "aws", + "programming_models": [OpenMPCPUOnlySystem()], + "scheduler": "flux", + "hardware_key": str(hardware_descriptions) + + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", + } + id_to_resources = { + "c7i.24xlarge": { + **common, + "sys_cores_per_node": 96, + "sys_mem_per_node_GB": 192, + }, "c7i.12xlarge": { - "system_site": "aws", + **common, "sys_cores_per_node": 48, "sys_mem_per_node_GB": 96, - "hardware_key": str(hardware_descriptions) - + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", }, } variant( "instance_type", - values="c7i.12xlarge", + values=("c7i.12xlarge", "c7i.24xlarge"), default="c7i.12xlarge", description="AWS instance type", ) -2. Specify the class initializer ------------------------------ -__init__ + def __init__(self, spec): + super().__init__(spec) + + attrs = self.id_to_resources.get(self.spec.variants["instance_type"][0]) + for k, v in attrs.items(): + setattr(self, k, v) 3. Add a packages section ----------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Here, we define the ``compute_packages_section()`` function, where you can include any +package that you would like a package manager, such as spack, to find as an "external", +meaning it will not build that package from source and use your system package instead. +For each package that you include, you need to define its spec ``name@version`` and the +system path ``prefix`` to the package. Additionally for spack, you need to set +``buildable: False`` to use the package as an external. + +At minimum, we recommend you define externals for ``cmake``, ``mpi``, ``blas``, and +``lapack``. See :ref:`adding-sys-packages`, for help on how to search for the packages +installed on your system. + +.. note:: + For ``mpi``, you need to define ``"mpi": {"buildable": False},`` as a virtual + package, and then define your MPI package as we have for ``openmpi``. + +:: + + from benchpark.directives import maintainers, variant + from benchpark.openmpsystem import OpenMPCPUOnlySystem + from benchpark.paths import hardware_descriptions + from benchpark.system import System + + + class AwsTutorial(System): + + ... + + + def compute_packages_section(self): + return { + "packages": { + "tar": { + "externals": [{"spec": "tar@1.34", "prefix": "/usr"}], + "buildable": False, + }, + "gmake": {"externals": [{"spec": "gmake@4.3", "prefix": "/usr"}]}, + "lapack": { + "externals": [{"spec": "lapack@0.29.2", "prefix": "/usr"}], + "buildable": False, + }, + "mpi": {"buildable": False}, + "openmpi": { + "externals": [ + { + "spec": "openmpi@4.0%gcc@11.4.0", + "prefix": "/usr", + } + ] + }, + "cmake": { + "externals": [{"spec": "cmake@4.0.2", "prefix": "/usr"}], + "buildable": False, + }, + "git": { + "externals": [{"spec": "git@2.34.1~tcltk", "prefix": "/usr"}], + "buildable": False, + }, + "openssl": { + "externals": [{"spec": "openssl@3.0.2", "prefix": "/usr"}], + "buildable": False, + }, + "automake": { + "externals": [{"spec": "automake@1.16.5", "prefix": "/usr"}], + "buildable": False, + }, + "openssh": { + "externals": [{"spec": "openssh@8.9p1", "prefix": "/usr"}], + "buildable": False, + }, + "m4": { + "externals": [{"spec": "m4@1.4.18", "prefix": "/usr"}], + "buildable": False, + }, + "sed": { + "externals": [{"spec": "sed@4.8", "prefix": "/usr"}], + "buildable": False, + }, + "autoconf": { + "externals": [{"spec": "autoconf@2.71", "prefix": "/usr"}], + "buildable": False, + }, + "diffutils": { + "externals": [{"spec": "diffutils@3.8", "prefix": "/usr"}], + "buildable": False, + }, + "coreutils": { + "externals": [{"spec": "coreutils@8.32", "prefix": "/usr"}], + "buildable": False, + }, + "findutils": { + "externals": [{"spec": "findutils@4.8.0", "prefix": "/usr"}], + "buildable": False, + }, + "binutils": { + "externals": [ + {"spec": "binutils@2.38+gold~headers", "prefix": "/usr"} + ], + "buildable": False, + }, + "perl": { + "externals": [ + { + "spec": "perl@5.34.0~cpanm+opcode+open+shared+threads", + "prefix": "/usr", + } + ], + "buildable": False, + }, + "caliper": { + "externals": [ + { + "spec": "caliper@master+adiak+mpi%gcc@11.4.0", + "prefix": "/usr", + } + ], + "buildable": False, + }, + "adiak": { + "externals": [{"spec": "adiak@0.4.1", "prefix": "/usr"}], + "buildable": False, + }, + "groff": { + "externals": [{"spec": "groff@1.22.4", "prefix": "/usr"}], + "buildable": False, + }, + "curl": { + "externals": [ + {"spec": "curl@7.81.0+gssapi+ldap+nghttp2", "prefix": "/usr"} + ], + "buildable": False, + }, + "ccache": { + "externals": [{"spec": "ccache@4.5.1", "prefix": "/usr"}], + "buildable": False, + }, + "flex": { + "externals": [{"spec": "flex@2.6.4+lex", "prefix": "/usr"}], + "buildable": False, + }, + "pkg-config": { + "externals": [{"spec": "pkg-config@0.29.2", "prefix": "/usr"}], + "buildable": False, + }, + "zlib": { + "externals": [{"spec": "zlib@1.2.11", "prefix": "/usr"}], + "buildable": False, + }, + "ninja": { + "externals": [{"spec": "ninja@1.10.1", "prefix": "/usr"}], + "buildable": False, + }, + "libtool": { + "externals": [{"spec": "libtool@2.4.6", "prefix": "/usr"}], + "buildable": False, + }, + } + } 4. Add a compilers section ------------------------ +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the ``compute_compilers_section``, we define the compilers available on the system. +For our AWS system, this is ``gcc@11.4.0``. We return a dictionary, with the helper +``compiler_section_for()`` function, that formulates the compiler ``name`` and +``entries`` for Spack, where the ``entries`` are a list of ``compiler_def()``. For the +``compiler_def()``, we must at minimum specify the ``spec``, ``prefix``, and ``exes``: + +1. ``spec`` - Similar to package specs, ``name@version``. GCC in particular also needs + the ``languages=c,c++,fortran`` variant. +2. ``prefix`` - Prefix to the compiler binary directory, e.g. ``/usr/`` for + ``/usr/bin/gcc`` +3. ``exes`` - Dictionary to map ``c``, ``cxx``, ``fortran`` to the appropriate file + found in the prefix. + +:: + + from benchpark.directives import maintainers, variant + from benchpark.openmpsystem import OpenMPCPUOnlySystem + from benchpark.paths import hardware_descriptions + from benchpark.system import System, compiler_def, compiler_section_for + + + class AwsTutorial(System): + + ... + + def compute_compilers_section(self): + return compiler_section_for( + "gcc", + [ + compiler_def( + "gcc@11.4.0 languages=c,c++,fortran", + "/usr/", + {"c": "gcc", "cxx": "g++", "fortran": "gfortran-11"}, + ) + ], + ) 5. Add a software section ------------------------ +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Finally, we define the ``compute_software_section()``, where at minimum we must define +the ``default-compiler`` for Ramble. This is trivial for the single compiler that we +have ``gcc@11.4.0``. + +:: + + from benchpark.directives import maintainers, variant + from benchpark.openmpsystem import OpenMPCPUOnlySystem + from benchpark.paths import hardware_descriptions + from benchpark.system import System, compiler_def, compiler_section_for + + class AwsTutorial(System): + + ... + + def compute_software_section(self): + return { + "software": { + "packages": { + "default-compiler": {"pkg_spec": "gcc@11.4.0"}, + } + } + } 6. Validating the System ------------------------- \ No newline at end of file +~~~~~~~~~~~~~~~~~~~~~~~~ + +To manually validate that your new system works, you should start by initializing your +system: + +:: + + benchpark system init --dest=aws-tutorial aws-tutorial + +If this completes without errors, you can continue by creating a benchmark +:doc:`add-a-benchmark`. + +System Appendix +--------------- + +.. _adding-sys-packages: + +1. Adding/Updating System Packages +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +External package definitions can be added/updated from the output of ``benchpark system +external``. If you don't have any packages yet, define ``compute_packages_section`` as +an empty dictionary: + +:: + + def compute_packages_section(self): + return { + "packages": {} + } + +And then whether or not you have packages, run ``benchpark system external +cluster=``: + +:: + + [ruby]$ benchpark system external llnl-cluster cluster=ruby + + $ benchpark system external llnl-cluster + ==> The following specs have been detected on this system and added to /g/g20/mckinsey/.benchmark/spack/etc/spack/packages.yaml + cmake@3.23.1 cmake@3.26.5 gmake@4.2.1 hwloc@2.11.2 python@2.7.18 python@2.7.18 python@3.6.8 python@3.9.12 python@3.10.8 python@3.12.8 tar@1.30 + The Packages are different. Here are the differences: + {'dictionary_item_added': ["root['gmake']['buildable']"], + 'dictionary_item_removed': ["root['elfutils']", "root['papi']", "root['unwind']", "root['blas']", "root['lapack']", "root['fftw']", "root['mpi']"], + 'iterable_item_added': {"root['cmake']['externals'][1]": {'prefix': '/usr/tce', + 'spec': 'cmake@3.23.1'}, + "root['python']['externals'][1]": {'prefix': '/usr', + 'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl~tkinter+uuid+zlib'}, + "root['python']['externals'][2]": {'prefix': '/usr', + 'spec': 'python@3.6.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, + "root['python']['externals'][3]": {'prefix': '/usr/tce', + 'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, + "root['python']['externals'][4]": {'prefix': '/usr/tce', + 'spec': 'python@3.9.12+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, + "root['python']['externals'][5]": {'prefix': '/usr/workspace/wsa/mckinsey/venv/benchpark-3.12.8', + 'spec': 'python@3.12.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}}, + 'values_changed': {"root['cmake']['externals'][0]['prefix']": {'new_value': '/usr', + 'old_value': '/usr/tce/packages/cmake/cmake-3.26.3'}, + "root['cmake']['externals'][0]['spec']": {'new_value': 'cmake@3.26.5', + 'old_value': 'cmake@3.26.3'}, + "root['hwloc']['externals'][0]['spec']": {'new_value': 'hwloc@2.11.2', + 'old_value': 'hwloc@2.9.1'}, + "root['python']['externals'][0]['prefix']": {'new_value': '/usr/WS1/mckinsey/venv/python-3.10.8', + 'old_value': '/usr/tce/packages/python/python-3.9.12/'}, + "root['python']['externals'][0]['spec']": {'new_value': 'python@3.10.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib', + 'old_value': 'python@3.9.12'}}} + Here are all of the new packages: + {'cmake': {'buildable': False, + 'externals': [{'prefix': '/usr', 'spec': 'cmake@3.26.5'}, + {'prefix': '/usr/tce', 'spec': 'cmake@3.23.1'}]}, + 'gmake': {'buildable': False, + 'externals': [{'prefix': '/usr', 'spec': 'gmake@4.2.1'}]}, + 'hwloc': {'buildable': False, + 'externals': [{'prefix': '/usr', 'spec': 'hwloc@2.11.2'}]}, + 'python': {'buildable': False, + 'externals': [{'prefix': '/usr/WS1/mckinsey/venv/python-3.10.8', + 'spec': 'python@3.10.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, + {'prefix': '/usr', + 'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl~tkinter+uuid+zlib'}, + {'prefix': '/usr', + 'spec': 'python@3.6.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, + {'prefix': '/usr/tce', + 'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, + {'prefix': '/usr/tce', + 'spec': 'python@3.9.12+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, + {'prefix': '/usr/workspace/wsa/mckinsey/venv/benchpark-3.12.8', + 'spec': 'python@3.12.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}]}, + 'tar': {'buildable': False, + 'externals': [{'prefix': '/usr', 'spec': 'tar@1.30'}]}} + +where the command should be ran on a cluster that is defined for the given system, e.g. +ruby for llnl-cluster. Use this output to update your package definitions in your +``system.py``'s ``compute_package_section()``. + +For packages that are not found by ``benchpark system external``, you can manually find +them using a command like the ``module`` command: + +:: + + [dane6:~]$ module display gcc/12.1.1 + ... + prepend_path("PATH","/usr/tce/packages/gcc/gcc-12.1.1/bin") + +Therefore, I can add my ``prefix`` as ``/usr/tce/packages/gcc/gcc-12.1.1/`` and my spec +as ``gcc@12.1.1``. diff --git a/docs/update-a-system-config.rst b/docs/update-a-system-config.rst deleted file mode 100644 index 6acc0845a..000000000 --- a/docs/update-a-system-config.rst +++ /dev/null @@ -1,70 +0,0 @@ -.. - Copyright 2023 Lawrence Livermore National Security, LLC and other - Benchpark Project Developers. See the top-level COPYRIGHT file for details. - - SPDX-License-Identifier: Apache-2.0 - -Updating a System -================= - -If a system already exists, its external package definitions can be updated from the -output of `benchpark system external`: - -:: - - [ruby]$ benchpark system external llnl-cluster cluster=ruby - - $ benchpark system external llnl-cluster - ==> The following specs have been detected on this system and added to /g/g20/mckinsey/.spack/packages.yaml - cmake@3.23.1 cmake@3.26.5 gmake@4.2.1 hwloc@2.11.2 python@2.7.18 python@2.7.18 python@3.6.8 python@3.9.12 python@3.10.8 python@3.12.8 tar@1.30 - The Packages are different. Here are the differences: - {'dictionary_item_added': ["root['gmake']['buildable']"], - 'dictionary_item_removed': ["root['elfutils']", "root['papi']", "root['unwind']", "root['blas']", "root['lapack']", "root['fftw']", "root['mpi']"], - 'iterable_item_added': {"root['cmake']['externals'][1]": {'prefix': '/usr/tce', - 'spec': 'cmake@3.23.1'}, - "root['python']['externals'][1]": {'prefix': '/usr', - 'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl~tkinter+uuid+zlib'}, - "root['python']['externals'][2]": {'prefix': '/usr', - 'spec': 'python@3.6.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, - "root['python']['externals'][3]": {'prefix': '/usr/tce', - 'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, - "root['python']['externals'][4]": {'prefix': '/usr/tce', - 'spec': 'python@3.9.12+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, - "root['python']['externals'][5]": {'prefix': '/usr/workspace/wsa/mckinsey/venv/benchpark-3.12.8', - 'spec': 'python@3.12.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}}, - 'values_changed': {"root['cmake']['externals'][0]['prefix']": {'new_value': '/usr', - 'old_value': '/usr/tce/packages/cmake/cmake-3.26.3'}, - "root['cmake']['externals'][0]['spec']": {'new_value': 'cmake@3.26.5', - 'old_value': 'cmake@3.26.3'}, - "root['hwloc']['externals'][0]['spec']": {'new_value': 'hwloc@2.11.2', - 'old_value': 'hwloc@2.9.1'}, - "root['python']['externals'][0]['prefix']": {'new_value': '/usr/WS1/mckinsey/venv/python-3.10.8', - 'old_value': '/usr/tce/packages/python/python-3.9.12/'}, - "root['python']['externals'][0]['spec']": {'new_value': 'python@3.10.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib', - 'old_value': 'python@3.9.12'}}} - Here are all of the new packages: - {'cmake': {'buildable': False, - 'externals': [{'prefix': '/usr', 'spec': 'cmake@3.26.5'}, - {'prefix': '/usr/tce', 'spec': 'cmake@3.23.1'}]}, - 'gmake': {'buildable': False, - 'externals': [{'prefix': '/usr', 'spec': 'gmake@4.2.1'}]}, - 'hwloc': {'buildable': False, - 'externals': [{'prefix': '/usr', 'spec': 'hwloc@2.11.2'}]}, - 'python': {'buildable': False, - 'externals': [{'prefix': '/usr/WS1/mckinsey/venv/python-3.10.8', - 'spec': 'python@3.10.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, - {'prefix': '/usr', - 'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl~tkinter+uuid+zlib'}, - {'prefix': '/usr', - 'spec': 'python@3.6.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, - {'prefix': '/usr/tce', - 'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, - {'prefix': '/usr/tce', - 'spec': 'python@3.9.12+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}, - {'prefix': '/usr/workspace/wsa/mckinsey/venv/benchpark-3.12.8', - 'spec': 'python@3.12.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}]}, - 'tar': {'buildable': False, - 'externals': [{'prefix': '/usr', 'spec': 'tar@1.30'}]}} - -where the command should be ran on a cluster that is defined for the given system, e.g. -ruby for llnl-cluster. From 2c33af8c0bc5922907ad81df18c71e23a00d08d7 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Mon, 22 Sep 2025 15:06:47 -0700 Subject: [PATCH 06/23] Remove old version --- docs/add-a-system-config-old.rst | 446 ------------------------------- 1 file changed, 446 deletions(-) delete mode 100644 docs/add-a-system-config-old.rst diff --git a/docs/add-a-system-config-old.rst b/docs/add-a-system-config-old.rst deleted file mode 100644 index 2a4f3a800..000000000 --- a/docs/add-a-system-config-old.rst +++ /dev/null @@ -1,446 +0,0 @@ -.. - Copyright 2023 Lawrence Livermore National Security, LLC and other - Benchpark Project Developers. See the top-level COPYRIGHT file for details. - - SPDX-License-Identifier: Apache-2.0 - -Adding a System -=============== - -This guide is intended for those wanting to run a benchmark on a new system, such as -vendors, system administrators, or application developers. It assumes a system -specification does not already exist. - -System specifications include two types of information: - -1. Hardware specs in `hardware_description.yaml` (e.g., how many CPU cores the node has) -2. Software stack specs in `system.py` (e.g., installed compilers and libraries, along - with their locations and versions) - -.. - note: - Please replace the steps below with a flow diagram. - -To specify a new system: - -1. Identify a system in Benchpark with the same hardware. -2. If a system with the same hardware does not exist, add a new hardware description, as - described in Adding System Hardware Specs section. -3. Identify the same software stack description. Typically if the same hardware is - already used by Benchpark, the same software stack may already be specified if the - same vendor software stack is used on this hardware - or, if a software stack of your - datacenter is already specified. -4. If the same software stack description does not exist, determine if there is one that - can be parameterized to match yours. -5. If can't parameterize existing software description, add a new one. - -1. Adding System Hardware Specs -------------------------------- - -We list hardware descriptions of Systems specified in Benchpark in the System Catalogue -in :doc:`system-list`. - -If you are running on a system with an accelerator, find an existing system with the -same accelerator vendor, and then secondarily, if you can, match the actual accelerator. - -1. accelerator.vendor -2. accelerator.name - -Once you have found an existing system with a similar accelerator or if you do not have -an accelerator, match the following processor specs as closely as you can. - -1. processor.name -2. processor.ISA -3. processor.uArch - -For example, if your system has an NVIDIA A100 GPU and an Intel x86 Icelake CPUs, a -similar config would share the A100 GPU, and CPU architecture may or may not match. Or, -if I do not have GPUs and instead have SapphireRapids CPUs, the closest match would be -another system with x86_64, Xeon Platinum, SapphireRapids. - -If there is not an exact match, you may add a new directory in the -`systems/all_hardware_descriptions/system_name` where `system_name` follows the naming -convention: - -:: - - [INTEGRATOR]-MICROARCHITECTURE[-GPU][-NETWORK] - -where: - -:: - - INTEGRATOR = COMPANY[_PRODUCTNAME][...] - - MICROARCHITECTURE = CPU Microarchitecture - - GPU = GPU Product Name - - NETWORK = Network Product Name - -In the `systems/all_hardware_descriptions/system_name` directory, add a -`hardware_description.yaml` which follows the yaml format of existing -`hardware_description.yaml` files. - -2. Adding or Parameterizing System Software Stack -------------------------------------------------- - -``system.py`` in Benchpark provides an API to represent a system software stack as a -command line parameterizable object. If none of the available software stack -specifications match your system, you may add a `new-system` directory in the `systems` -directory where the `new-system` directory name follows the naming convention: - -:: - - SITE-SYSTEMNAME - -where: - -:: - - SITE = nosite | abbreviated datacenter name - - SYSTEMNAME = the name of the specific system - -.. - note: - make all these x86 example. Automate the directory structure? - -Next, copy the system.py from the system with the most similar software stack into -`new-system` directory, and update it to match your system. For example, the generic-x86 -system software stack is defined in: - -:: - - $benchpark - ├── systems - ├── generic-x86 - ├── system.py - -The System base class is defined in ``/lib/benchpark/system.py``, some or all of the -functions can be overridden to define custom system behavior. Your -``systems/{SYSTEM}/system.py`` should inherit from the System base class. - -The generic-x86 system subclass should run on most x86_64 systems, but we mostly provide -it as a starting point for modifying or testing. Potential common changes might be to -edit the scheduler or number of cores per node, adding a GPU configuration, or adding -other external compilers or packages. - -To make these changes, we provided an example below, where we start with the generic-x86 -system.py, and make a system called Modifiedx86. - -1. First, make a copy of the system.py file in generic_x86 folder and move it into a new - folder, e.g., ``systems/modified_x86/system.py``. Then, update the class name to - ``Modifiedx86``.: - - :: - - class Modifiedx86(System): - -2. Next, to match our new system, we change the scheduler to slurm and the number of - cores per node to 48, and number of GPUs per node to 2.: - - :: - - # this sets basic attributes of our system - def __init__(self, spec): - super().__init__(spec) - self.scheduler = "slurm" - self.sys_cores_per_node = "48" - self.sys_gpus_per_node = "2" - -3. Let's say the new system's GPUs are NVIDIA, we can add a variant that allows us to - specify the version of CUDA we want to use, and the location of those CUDA - installations on our system. We then add the spack package configuration for our CUDA - installations into the `compute_packages_section`.: - - :: - - # import the variant feature at the top of your system.py - from benchpark.directives import variant - - # this allows us to specify which cuda version we want as a command line parameter - variant( - "cuda", - default="11-8-0", - values=("11-8-0", "10-1-243"), - description="CUDA version", - ) - - # set this to pass to spack - def system_specific_variables(self): - return {"cuda_arch": "70"} - - # define the external package locations - def compute_packages_section(self): - selections = { - "packages": { - "elfutils": { - "externals": [{"spec": "elfutils@0.176", "prefix": "/usr"}], - "buildable": False, - }, - "papi": { - "buildable": False, - "externals": [{"spec": "papi@5.2.0.0", "prefix": "/usr"}], - }, - } - } - if self.spec.satisfies("cuda=10-1-243"): - selections["packages"] |= { - "cusparse": { - "externals": [ - { - "spec": "cusparse@10.1.243", - "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", - } - ], - "buildable": False, - }, - "cuda": { - "externals": [ - { - "spec": "cuda@10.1.243+allow-unsupported-compilers", - "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", - } - ], - "buildable": False, - }, - } - elif self.spec.satisfies("cuda=11-8-0"): - selections["packages"] |= { - "cusparse": { - "externals": [ - { - "spec": "cusparse@11.8.0", - "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", - } - ], - "buildable": False, - }, - "cuda": { - "externals": [ - { - "spec": "cuda@11.8.0+allow-unsupported-compilers", - "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", - } - ], - "buildable": False, - }, - } - - return selections - -External packages can be found via `benchpark system external ---new-system -{mysite}-{mysystem}`. Note, if your externals are *not* installed via Spack, read `Spack -documentation on modules -`_. - -4. Next, add any of the packages that can be managed by spack, such as blas/cublas -pointing to the correct version, this will generate the software configurations for -spack (``software.yaml``). The actual version will be rendered by Ramble when it is -built. - -:: - - def compute_software_section(self): - return { - "software": { - "packages": { - "default-compiler": {"pkg_spec": "gcc"}, - "compiler-gcc": {"pkg_spec": "gcc"}, - "default-mpi": {"pkg_spec": "openmpi"}, - "blas": {"pkg_spec": "openblas"}, - "lapack": {"pkg_spec": "openblas"}, - } - } - } - -5. The full system.py class for the modified_x86 system should now look like: - -:: - - import pathlib - - from benchpark.directives import variant - from benchpark.system import System - - class Modifiedx86(System): - - variant( - "cuda", - default="11-8-0", - values=("11-8-0", "10-1-243"), - description="CUDA version", - ) - - def __init__(self): - super().__init__() - - self.scheduler = "slurm" - setattr(self, "sys_cores_per_node", 48) - self.sys_gpus_per_node = "2" - - def system_specific_variables(self): - return {"cuda_arch": "70"} - - def compute_packages_section(self): - selections = { - "packages": { - "elfutils": { - "externals": [{"spec": "elfutils@0.176", "prefix": "/usr"}], - "buildable": False, - }, - "papi": { - "buildable": False, - "externals": [{"spec": "papi@5.2.0.0", "prefix": "/usr"}], - }, - } - } - if self.spec.satisfies("cuda=10-1-243"): - selections["packages"] |= { - "cusparse": { - "externals": [ - { - "spec": "cusparse@10.1.243", - "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", - } - ], - "buildable": False, - }, - "cuda": { - "externals": [ - { - "spec": "cuda@10.1.243+allow-unsupported-compilers", - "prefix": "/usr/tce/packages/cuda/cuda-10.1.243", - } - ], - "buildable": False, - }, - } - elif self.spec.satisfies("cuda=11-8-0"): - selections["packages"] |= { - "cusparse": { - "externals": [ - { - "spec": "cusparse@11.8.0", - "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", - } - ], - "buildable": False, - }, - "cuda": { - "externals": [ - { - "spec": "cuda@11.8.0+allow-unsupported-compilers", - "prefix": "/usr/tce/packages/cuda/cuda-11.8.0", - } - ], - "buildable": False, - }, - } - - return selections - - def compute_software_section(self): - return { - "software": { - "packages": { - "default-compiler": {"pkg_spec": "gcc"}, - "compiler-gcc": {"pkg_spec": "gcc"}, - "default-mpi": {"pkg_spec": "openmpi"}, - "blas": {"pkg_spec": "openblas"}, - "lapack": {"pkg_spec": "openblas"}, - } - } - } - """ - -Once the modified system subclass is written, run: ``benchpark system init ---dest=modifiedx86-system modifiedx86`` - -This will generate the required yaml configurations for your system and you can validate -it works with a static experiment test. - -.. note:: - - Use the ``benchpark info system {system_name}`` to find additional variants that are - available to all systems. This includes settings such as: the job timeout, - submitting to a different partition/queue, and setting the account/bank. - -3. Validating the System ------------------------- - -To manually validate your new system, you should initialize it and run an existing -experiment such as saxpy. For example: - -:: - - benchpark system init --dest=modifiedx86-system modifiedx86 - benchpark experiment init --dest=saxpy --system=modifiedx86-system saxpy +openmp - benchpark setup ./saxpy workspace/ - -Then you can run the commands provided by the output, the experiments should be built -and run successfully without any errors. - -The following yaml files are examples of what is generated for the modified_x86 system -from the example after it is initialized: - -.. note:: - - The following files are generated by benchpark (in the system destination folder) - and do not have to be manually created. - -1. ``system_id.yaml`` describes the system hardware, including the integrator (and the - name of the product node or cluster type), the processor, (optionally) the - accelerator, and the network; the information included here is what you will - typically see recorded about the system on Top500.org. We intend to make the system - definitions in Benchpark searchable, and will add a schema to enforce consistency; - until then, please copy the file and fill out all of the fields without changing the - keys. Also listed is the specific system the config was developed and tested on, as - well as the known systems with the same hardware so that the users of those systems - can find this system specification. - -.. code-block:: yaml - - system: - name: Modifiedx86 - spec: sysbuiltin.modifiedx86 cuda=11-8-0 - config-hash: 5310ebe8b2c841108e5da854c75dab931f5397a7fb41726902bb8a51ffb84a36 - -2. ``software.yaml`` defines default compiler and package names your package manager -(Spack) should use to build the benchmarks on this system. ``software.yaml`` becomes the -spack section in the `Ramble configuration file -`_. - -.. code-block:: yaml - - software: - packages: - default-compiler: - pkg_spec: 'gcc' - compiler-gcc: - pkg_spec: 'gcc' - default-mpi: - pkg_spec: 'openmpi' - blas: - pkg_spec: cublas@{default_cuda_version} - cublas-cuda: - pkg_spec: cublas@{default_cuda_version} - -3. ``variables.yaml`` defines system-specific launcher and job scheduler. - -.. code-block:: yaml - - variables: - timeout: "120" - scheduler: "slurm" - sys_cores_per_node: "48" - sys_gpus_per_node: 2 - cuda_arch: 70 - n_ranks: 18446744073709551615 # placeholder value - n_nodes: 18446744073709551615 # placeholder value - batch_submit: "placeholder" - mpi_command: "placeholder" - -Once you can run an experiment successfully, and the yaml looks correct, the new system -has been validated and you can continue your :doc:`benchpark-workflow`. From 1e2c0f7c4087e927ac936e202def4635da0fcfc9 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Mon, 22 Sep 2025 15:10:37 -0700 Subject: [PATCH 07/23] Add defs --- docs/add-a-system-config.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index ca26dc7f9..5772e45a9 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -44,23 +44,23 @@ in :doc:`system-list`. If you are running on a system with an accelerator, find an existing system with the same accelerator vendor, and then secondarily, if you can, match the actual accelerator. -1. ``accelerator.vendor`` - TODO -2. ``accelerator.name`` - -3. ``accelerator.ISA`` - -4. ``accelerator.uArch`` - +1. ``accelerator.vendor`` - Company name +2. ``accelerator.name`` - Product name +3. ``accelerator.ISA`` - Instruction set architecture +4. ``accelerator.uArch`` - Microarchitecture Once you have found an existing system with a similar accelerator or if you do not have an accelerator, match the following processor specs as closely as you can. -1. ``processor.name`` - -2. ``processor.ISA`` - -3. ``processor.uArch`` - -4. ``processor.vendor`` - +1. ``processor.vendor`` - Company name +2. ``processor.name`` - Product name +3. ``processor.ISA`` - Instruction set architecture +4. ``processor.uArch`` - Microarchitecture And add the interconnect vendor and product name. -1. ``interconnect.vendor`` - -2. ``interconnect.name`` - +1. ``interconnect.vendor`` - Company name +2. ``interconnect.name`` - Product name For example, if your system has an NVIDIA A100 GPU and an Intel x86 Icelake CPUs, a similar config would share the A100 GPU, and CPU architecture may or may not match. Or, From a4fa71282200f3a35dee449a205351dfd54be811 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Mon, 22 Sep 2025 15:19:59 -0700 Subject: [PATCH 08/23] Fix --- docs/add-a-system-config.rst | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index 5772e45a9..f94bb1867 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -31,7 +31,8 @@ To determine if you need to create a new system: specify your systems specific resource configuration under the ``id_to_resources`` dictionary. 4. If the same software stack description does not exist, determine if there is one that - can be parameterized to match yours, otherwise proceed with adding a new system. + can be parameterized to match yours, otherwise proceed with adding a new system in + :ref:`system-specification`. .. _adding-system-hardware-specs: @@ -68,12 +69,12 @@ if I do not have GPUs and instead have SapphireRapids CPUs, the closest match wo another system with x86_64, Xeon Platinum, SapphireRapids. If there is not an exact match, you may add a new directory in the -`systems/all_hardware_descriptions/system_name` where `system_name` follows the naming -convention: +``systems/all_hardware_descriptions/system_name`` where ``system_name`` follows the +naming convention: :: - [INTEGRATOR]-MICROARCHITECTURE[-GPU][-NETWORK] + [INTEGRATOR]-MICROARCHITECTURE[-ACCELERATOR][-NETWORK] where: @@ -83,13 +84,15 @@ where: MICROARCHITECTURE = CPU Microarchitecture - GPU = GPU Product Name + ACCELERATOR = ACCELERATOR Product Name NETWORK = Network Product Name -In the `systems/all_hardware_descriptions/system_name` directory, add a -`hardware_description.yaml` which follows the yaml format of existing -`hardware_description.yaml` files. +In the ``systems/all_hardware_descriptions/system_name`` directory, add a +``hardware_description.yaml`` which follows the yaml format of existing +``hardware_description.yaml`` files. + +.. _system-specification: B. Creating the System Definition (``system.py``) ------------------------------------------------- @@ -105,8 +108,8 @@ from scratch. 1. Creating the System class ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In this example, we will recreate a fully-functional simplified example of the AWS -``system.py`` that we use for benchpark tutorials (see `aws-tutorial/system.py +In this example, we will recreate a fully-functional example of the AWS ``system.py`` +that we use for benchpark tutorials (see `aws-tutorial/system.py `_). To start, we import the base benchpark ``System`` class, which our ``AwsTutorial`` system will inherit from. We also import the maintainer and variant directives, which provide @@ -114,7 +117,7 @@ the utilities to track a maintainer by their GitHub username and variants to spe configurable properties of our system. We can specify the different AWS instances that share this same hardware and software specificaiton using the ``instance_type`` variant. We use ``instance_type`` here instead of ``cluster`` (you will see ``cluster`` in other -systems), because ``instance_type`` is more fitting in this context. +systems), to differentiate a cloud ``instance`` from an HPC ``cluster``. :: @@ -132,7 +135,7 @@ systems), because ``instance_type`` is more fitting in this context. description="AWS instance type", ) -2. Specify the class initializer and resources +1. Specify the class initializer and resources ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When defining ``__init__()`` for our system, we invoke the parent From 5240c6e727fc4f93ee95cacabb7a1c1bbae5ae30 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Mon, 22 Sep 2025 15:39:18 -0700 Subject: [PATCH 09/23] fixes --- docs/add-a-system-config.rst | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index f94bb1867..ee6fb12f7 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -102,7 +102,7 @@ the ``system.py``, which involves defining the software on your system. This inc defining compilers and pre-installed packages, which your package manager can use instead of attempting to build the package from scratch. If using Spack, defining as many external packages as possible here will ensure a much faster build process, and -using system-installed packages will likely always be more performant than building them +using system-installed packages may be significantly more performant than building them from scratch. 1. Creating the System class @@ -130,12 +130,12 @@ systems), to differentiate a cloud ``instance`` from an HPC ``cluster``. variant( "instance_type", - values=("c7i.12xlarge", c7i.24xlarge), + values=("c7i.12xlarge", "c7i.24xlarge"), default="c7i.12xlarge", description="AWS instance type", ) -1. Specify the class initializer and resources +2. Specify the class initializer and resources ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When defining ``__init__()`` for our system, we invoke the parent @@ -148,16 +148,16 @@ When defining ``__init__()`` for our system, we invoke the parent located. 2. ``programming_models`` - List of applicable programming models. ``MPI`` is assumed for every system in benchpark, so you do not need to add it here. For this system, we - add ``OpenMPCPUOnlySystem`` (different from GPU openmp). If we had Nvidia + add ``OpenMPCPUOnlySystem`` (different from GPU openmp). If we had NVIDIA accelerators, we would add ``CudaSystem`` to this list, and ``ROCmSystem`` for AMD. 3. ``scheduler`` - The job scheduler. -4. ``hardware_key``, which defines a path to the yaml description you just created in +4. ``hardware_key`` - which defines a path to the yaml description you just created in the previous step. 5. ``sys_cores_per_node`` - The amount of hardware cores per node. -6. ``sys_mem_per_node_GB`` - The amount of node memory, in gigabytes. +6. ``sys_mem_per_node_GB`` - The amount of node memory (in gigabytes). This information is used to determine the necessary resource allocation request for any -experiment initialized with your chosen. +experiment initialized with your chosen instance. :: @@ -209,7 +209,7 @@ experiment initialized with your chosen. ~~~~~~~~~~~~~~~~~~~~~~~~~ Here, we define the ``compute_packages_section()`` function, where you can include any -package that you would like a package manager, such as spack, to find as an "external", +package that you would like the package manager, such as spack, to find on the system, meaning it will not build that package from source and use your system package instead. For each package that you include, you need to define its spec ``name@version`` and the system path ``prefix`` to the package. Additionally for spack, you need to set @@ -370,7 +370,7 @@ installed on your system. 4. Add a compilers section ~~~~~~~~~~~~~~~~~~~~~~~~~~ -In the ``compute_compilers_section``, we define the compilers available on the system. +In the ``compute_compilers_section()``, we define the compilers available on the system. For our AWS system, this is ``gcc@11.4.0``. We return a dictionary, with the helper ``compiler_section_for()`` function, that formulates the compiler ``name`` and ``entries`` for Spack, where the ``entries`` are a list of ``compiler_def()``. For the @@ -380,7 +380,7 @@ For our AWS system, this is ``gcc@11.4.0``. We return a dictionary, with the hel the ``languages=c,c++,fortran`` variant. 2. ``prefix`` - Prefix to the compiler binary directory, e.g. ``/usr/`` for ``/usr/bin/gcc`` -3. ``exes`` - Dictionary to map ``c``, ``cxx``, ``fortran`` to the appropriate file +3. ``exes`` - Dictionary to map ``c``, ``cxx``, and ``fortran`` to the appropriate file found in the prefix. :: @@ -412,7 +412,7 @@ For our AWS system, this is ``gcc@11.4.0``. We return a dictionary, with the hel Finally, we define the ``compute_software_section()``, where at minimum we must define the ``default-compiler`` for Ramble. This is trivial for the single compiler that we -have ``gcc@11.4.0``. +have, ``gcc@11.4.0``. :: @@ -531,7 +531,8 @@ ruby for llnl-cluster. Use this output to update your package definitions in you ``system.py``'s ``compute_package_section()``. For packages that are not found by ``benchpark system external``, you can manually find -them using a command like the ``module`` command: +them using a command like the ``module`` command, if your system has environment +modules: :: From 4009630a699fb7d1c71a557f303c9f65fcf21f1c Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 09:28:09 -0700 Subject: [PATCH 10/23] Lint --- lib/benchpark/cmd/system.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/benchpark/cmd/system.py b/lib/benchpark/cmd/system.py index 1b2aa1b21..f338f6c96 100644 --- a/lib/benchpark/cmd/system.py +++ b/lib/benchpark/cmd/system.py @@ -94,7 +94,9 @@ def system_external(args): + [pkg for pkg in pkg_list] ) - with open(benchpark.paths.benchpark_home / "spack/etc/spack/packages.yaml", "r") as file: + with open( + benchpark.paths.benchpark_home / "spack/etc/spack/packages.yaml", "r" + ) as file: new_packages = yaml.safe_load(file)["packages"] # Use DeepDiff to find differences From 6c3c044909f70adf54b0c61e636218daf4e5d57c Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 09:47:05 -0700 Subject: [PATCH 11/23] lint --- docs/add-a-system-config.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index ee6fb12f7..f6bc8b568 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -115,7 +115,7 @@ start, we import the base benchpark ``System`` class, which our ``AwsTutorial`` will inherit from. We also import the maintainer and variant directives, which provide the utilities to track a maintainer by their GitHub username and variants to specify configurable properties of our system. We can specify the different AWS instances that -share this same hardware and software specificaiton using the ``instance_type`` variant. +share this same hardware and software specification using the ``instance_type`` variant. We use ``instance_type`` here instead of ``cluster`` (you will see ``cluster`` in other systems), to differentiate a cloud ``instance`` from an HPC ``cluster``. From feecc7c50134069dd740ec670ff18aaf1dba4043 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 10:04:36 -0700 Subject: [PATCH 12/23] Update text for cluster/instance --- docs/add-a-system-config.rst | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index f6bc8b568..aec5d34de 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -114,10 +114,19 @@ that we use for benchpark tutorials (see `aws-tutorial/system.py start, we import the base benchpark ``System`` class, which our ``AwsTutorial`` system will inherit from. We also import the maintainer and variant directives, which provide the utilities to track a maintainer by their GitHub username and variants to specify -configurable properties of our system. We can specify the different AWS instances that -share this same hardware and software specification using the ``instance_type`` variant. -We use ``instance_type`` here instead of ``cluster`` (you will see ``cluster`` in other -systems), to differentiate a cloud ``instance`` from an HPC ``cluster``. +configurable properties of our system. There are many similar types of AWS nodes that +differ only in terms of the number of processors and/or memory (but otherwise have the +same system packages available). This can be encoded with a variant in a benchpark +system - the user can indicate what type of instance they are creating, and the system +description will reflect the instance type chosen. We can specify the different AWS +instances that share this same hardware and software specification using the +``instance_type`` variant. + +.. note:: + + Most system classes in Benchpark have a similar concept, but often they refer to + physical (named) clusters with very-similar configs, and so they typically use the + term "cluster" rather than "instance_type". :: From 419f39b3156753591255552ca322c90d65eefe61 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 11:10:28 -0700 Subject: [PATCH 13/23] cuda and rocm --- docs/add-a-system-config.rst | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index aec5d34de..a6b671d2d 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -225,8 +225,13 @@ system path ``prefix`` to the package. Additionally for spack, you need to set ``buildable: False`` to use the package as an external. At minimum, we recommend you define externals for ``cmake``, ``mpi``, ``blas``, and -``lapack``. See :ref:`adding-sys-packages`, for help on how to search for the packages -installed on your system. +``lapack``. Additionally, for systems with accelerators, define externals for CUDA and +ROCm runtime libraries (see an example for a `CUDA system +`_, +and a `ROCm system +`_). +Also, see :ref:`adding-sys-packages`, for help on how to search for the packages +available on your system. .. note:: From f5f02a814b7b446ed855ff4f1c913be8986fdc2d Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 11:51:11 -0700 Subject: [PATCH 14/23] Improve compilers text --- docs/add-a-system-config.rst | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index a6b671d2d..5cd8b9a0b 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -384,14 +384,25 @@ available on your system. 4. Add a compilers section ~~~~~~~~~~~~~~~~~~~~~~~~~~ -In the ``compute_compilers_section()``, we define the compilers available on the system. -For our AWS system, this is ``gcc@11.4.0``. We return a dictionary, with the helper -``compiler_section_for()`` function, that formulates the compiler ``name`` and -``entries`` for Spack, where the ``entries`` are a list of ``compiler_def()``. For the +We define compilers that are available on our system by implementing +``compute_compilers_section()``: + +1. For each compiler, create the necessary config with ``compiler_def()``. +2. For each type of compiler (gcc, intel, etc.), combine them with + ``compiler_section_for()``. +3. Merge the compiler definitions with merge_dicts (this part is unnecessary if you have + only one type of compiler). +4. Generally you will want to compose a minimal list of compilers: e.g. if you want to + compile your benchmark with the oneAPI compiler, and have multiple versions to choose + from, you would add a variant to the system, and the config would expose only one of + them. + +For our AWS system, the compiler we define is ``gcc@11.4.0``. For the ``compiler_def()``, we must at minimum specify the ``spec``, ``prefix``, and ``exes``: 1. ``spec`` - Similar to package specs, ``name@version``. GCC in particular also needs - the ``languages=c,c++,fortran`` variant. + the ``languages`` variant, where the list of languages depends on the available + ``exes`` (e.g. do not include "fortran" if ``gfortran`` is not available). 2. ``prefix`` - Prefix to the compiler binary directory, e.g. ``/usr/`` for ``/usr/bin/gcc`` 3. ``exes`` - Dictionary to map ``c``, ``cxx``, and ``fortran`` to the appropriate file From dee4ab0f239eddf0ff241fbd40beb5a356fcd265 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 12:00:42 -0700 Subject: [PATCH 15/23] Prevent issues with multiple system defs in same process --- docs/add-a-system-config.rst | 2 +- systems/aws-tutorial/system.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index 5cd8b9a0b..b3a10d931 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -181,7 +181,6 @@ experiment initialized with your chosen instance. common = { "system_site": "aws", - "programming_models": [OpenMPCPUOnlySystem()], "scheduler": "flux", "hardware_key": str(hardware_descriptions) + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", @@ -209,6 +208,7 @@ experiment initialized with your chosen instance. def __init__(self, spec): super().__init__(spec) + self.programming_models = [OpenMPCPUOnlySystem()] attrs = self.id_to_resources.get(self.spec.variants["instance_type"][0]) for k, v in attrs.items(): diff --git a/systems/aws-tutorial/system.py b/systems/aws-tutorial/system.py index 4f4164521..f6aa95b4d 100644 --- a/systems/aws-tutorial/system.py +++ b/systems/aws-tutorial/system.py @@ -17,7 +17,6 @@ class AwsTutorial(System): common = { "system_site": "aws", - "programming_models": [OpenMPCPUOnlySystem()], "scheduler": "flux", "hardware_key": str(hardware_descriptions) + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", @@ -66,6 +65,7 @@ class AwsTutorial(System): def __init__(self, spec): super().__init__(spec) + self.programming_models = [OpenMPCPUOnlySystem()] attrs = self.id_to_resources.get(self.spec.variants["instance_type"][0]) for k, v in attrs.items(): From 09bdf4a9ac6e326b0b5347f9e2dd2ef310db3484 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 12:13:18 -0700 Subject: [PATCH 16/23] Update section titles --- docs/add-a-system-config.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index b3a10d931..df4bd87c1 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -144,7 +144,7 @@ instances that share this same hardware and software specification using the description="AWS instance type", ) -2. Specify the class initializer and resources +2. Specify the Class Initializer and Resources ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When defining ``__init__()`` for our system, we invoke the parent @@ -214,7 +214,7 @@ experiment initialized with your chosen instance. for k, v in attrs.items(): setattr(self, k, v) -3. Add a packages section +3. Add Software Definitions ~~~~~~~~~~~~~~~~~~~~~~~~~ Here, we define the ``compute_packages_section()`` function, where you can include any @@ -381,7 +381,7 @@ available on your system. } } -4. Add a compilers section +4. Add Compiler Definitions ~~~~~~~~~~~~~~~~~~~~~~~~~~ We define compilers that are available on our system by implementing @@ -432,7 +432,7 @@ For our AWS system, the compiler we define is ``gcc@11.4.0``. For the ], ) -5. Add a software section +5. Add a Software Section ~~~~~~~~~~~~~~~~~~~~~~~~~ Finally, we define the ``compute_software_section()``, where at minimum we must define From 5b6e7b08709cc9b0cb66a35e73b6139c9530f748 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 13:18:37 -0700 Subject: [PATCH 17/23] lint --- docs/add-a-system-config.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index df4bd87c1..a4e8f3fa8 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -215,7 +215,7 @@ experiment initialized with your chosen instance. setattr(self, k, v) 3. Add Software Definitions -~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here, we define the ``compute_packages_section()`` function, where you can include any package that you would like the package manager, such as spack, to find on the system, @@ -382,7 +382,7 @@ available on your system. } 4. Add Compiler Definitions -~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~ We define compilers that are available on our system by implementing ``compute_compilers_section()``: From c0e3a8f2be1ee75fdb347a03ca1affa50b67ea2e Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 14:59:10 -0700 Subject: [PATCH 18/23] Change text language for blas and lapack --- docs/add-a-system-config.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index a4e8f3fa8..c9727a245 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -224,9 +224,10 @@ For each package that you include, you need to define its spec ``name@version`` system path ``prefix`` to the package. Additionally for spack, you need to set ``buildable: False`` to use the package as an external. -At minimum, we recommend you define externals for ``cmake``, ``mpi``, ``blas``, and -``lapack``. Additionally, for systems with accelerators, define externals for CUDA and -ROCm runtime libraries (see an example for a `CUDA system +At minimum, we recommend you define externals for ``cmake`` and ``mpi`` (users also +typically define externals for math libraries like ``blas`` and ``lapack``). +Additionally, for systems with accelerators, define externals for CUDA and ROCm runtime +libraries (see an example for a `CUDA system `_, and a `ROCm system `_). From de1013c7ca41d46f1a89555d8a91cc895f60b646 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 15:38:08 -0700 Subject: [PATCH 19/23] Define less packages --- docs/add-a-system-config.rst | 112 +---------------------------------- 1 file changed, 1 insertion(+), 111 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index c9727a245..0ab8878b2 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -255,15 +255,6 @@ available on your system. def compute_packages_section(self): return { "packages": { - "tar": { - "externals": [{"spec": "tar@1.34", "prefix": "/usr"}], - "buildable": False, - }, - "gmake": {"externals": [{"spec": "gmake@4.3", "prefix": "/usr"}]}, - "lapack": { - "externals": [{"spec": "lapack@0.29.2", "prefix": "/usr"}], - "buildable": False, - }, "mpi": {"buildable": False}, "openmpi": { "externals": [ @@ -277,108 +268,7 @@ available on your system. "externals": [{"spec": "cmake@4.0.2", "prefix": "/usr"}], "buildable": False, }, - "git": { - "externals": [{"spec": "git@2.34.1~tcltk", "prefix": "/usr"}], - "buildable": False, - }, - "openssl": { - "externals": [{"spec": "openssl@3.0.2", "prefix": "/usr"}], - "buildable": False, - }, - "automake": { - "externals": [{"spec": "automake@1.16.5", "prefix": "/usr"}], - "buildable": False, - }, - "openssh": { - "externals": [{"spec": "openssh@8.9p1", "prefix": "/usr"}], - "buildable": False, - }, - "m4": { - "externals": [{"spec": "m4@1.4.18", "prefix": "/usr"}], - "buildable": False, - }, - "sed": { - "externals": [{"spec": "sed@4.8", "prefix": "/usr"}], - "buildable": False, - }, - "autoconf": { - "externals": [{"spec": "autoconf@2.71", "prefix": "/usr"}], - "buildable": False, - }, - "diffutils": { - "externals": [{"spec": "diffutils@3.8", "prefix": "/usr"}], - "buildable": False, - }, - "coreutils": { - "externals": [{"spec": "coreutils@8.32", "prefix": "/usr"}], - "buildable": False, - }, - "findutils": { - "externals": [{"spec": "findutils@4.8.0", "prefix": "/usr"}], - "buildable": False, - }, - "binutils": { - "externals": [ - {"spec": "binutils@2.38+gold~headers", "prefix": "/usr"} - ], - "buildable": False, - }, - "perl": { - "externals": [ - { - "spec": "perl@5.34.0~cpanm+opcode+open+shared+threads", - "prefix": "/usr", - } - ], - "buildable": False, - }, - "caliper": { - "externals": [ - { - "spec": "caliper@master+adiak+mpi%gcc@11.4.0", - "prefix": "/usr", - } - ], - "buildable": False, - }, - "adiak": { - "externals": [{"spec": "adiak@0.4.1", "prefix": "/usr"}], - "buildable": False, - }, - "groff": { - "externals": [{"spec": "groff@1.22.4", "prefix": "/usr"}], - "buildable": False, - }, - "curl": { - "externals": [ - {"spec": "curl@7.81.0+gssapi+ldap+nghttp2", "prefix": "/usr"} - ], - "buildable": False, - }, - "ccache": { - "externals": [{"spec": "ccache@4.5.1", "prefix": "/usr"}], - "buildable": False, - }, - "flex": { - "externals": [{"spec": "flex@2.6.4+lex", "prefix": "/usr"}], - "buildable": False, - }, - "pkg-config": { - "externals": [{"spec": "pkg-config@0.29.2", "prefix": "/usr"}], - "buildable": False, - }, - "zlib": { - "externals": [{"spec": "zlib@1.2.11", "prefix": "/usr"}], - "buildable": False, - }, - "ninja": { - "externals": [{"spec": "ninja@1.10.1", "prefix": "/usr"}], - "buildable": False, - }, - "libtool": { - "externals": [{"spec": "libtool@2.4.6", "prefix": "/usr"}], - "buildable": False, - }, + ... } } From edebb93e58657d2bfa3ffa3fb91994e107181a2a Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 15:42:10 -0700 Subject: [PATCH 20/23] Refactor common attributes and reorganize sections --- docs/add-a-system-config.rst | 141 ++++++++++++++++----------------- systems/aws-tutorial/system.py | 20 ++--- 2 files changed, 78 insertions(+), 83 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index 0ab8878b2..ef2c1d01c 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -150,8 +150,8 @@ instances that share this same hardware and software specification using the When defining ``__init__()`` for our system, we invoke the parent ``System::__init__()``, and set important system attributes using the ``id_to_resources`` dictionary, which contains information for each ``cluster`` or -``instance_type``. We can optionally refactor common attributes for all -``instance_type``'s into a separate dictionary, for readability: +``instance_type``. We define common attributes for all ``instance_type``'s in the +``__init__()`` function: 1. ``system_site`` - The name of the site where the ``cluster``/``instance_type`` is located. @@ -179,21 +179,12 @@ experiment initialized with your chosen instance. class AwsTutorial(System): maintainers("michaelmckinsey1") - common = { - "system_site": "aws", - "scheduler": "flux", - "hardware_key": str(hardware_descriptions) - + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", - } - id_to_resources = { "c7i.24xlarge": { - **common, "sys_cores_per_node": 96, "sys_mem_per_node_GB": 192, }, "c7i.12xlarge": { - **common, "sys_cores_per_node": 48, "sys_mem_per_node_GB": 96, }, @@ -208,71 +199,21 @@ experiment initialized with your chosen instance. def __init__(self, spec): super().__init__(spec) + + # Common attributes across instances self.programming_models = [OpenMPCPUOnlySystem()] + self.system_site = "aws" + self.scheduler = "flux" + self.hardware_key = ( + str(hardware_descriptions) + + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml" + ) attrs = self.id_to_resources.get(self.spec.variants["instance_type"][0]) for k, v in attrs.items(): setattr(self, k, v) -3. Add Software Definitions -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Here, we define the ``compute_packages_section()`` function, where you can include any -package that you would like the package manager, such as spack, to find on the system, -meaning it will not build that package from source and use your system package instead. -For each package that you include, you need to define its spec ``name@version`` and the -system path ``prefix`` to the package. Additionally for spack, you need to set -``buildable: False`` to use the package as an external. - -At minimum, we recommend you define externals for ``cmake`` and ``mpi`` (users also -typically define externals for math libraries like ``blas`` and ``lapack``). -Additionally, for systems with accelerators, define externals for CUDA and ROCm runtime -libraries (see an example for a `CUDA system -`_, -and a `ROCm system -`_). -Also, see :ref:`adding-sys-packages`, for help on how to search for the packages -available on your system. - -.. note:: - - For ``mpi``, you need to define ``"mpi": {"buildable": False},`` as a virtual - package, and then define your MPI package as we have for ``openmpi``. - -:: - - from benchpark.directives import maintainers, variant - from benchpark.openmpsystem import OpenMPCPUOnlySystem - from benchpark.paths import hardware_descriptions - from benchpark.system import System - - - class AwsTutorial(System): - - ... - - - def compute_packages_section(self): - return { - "packages": { - "mpi": {"buildable": False}, - "openmpi": { - "externals": [ - { - "spec": "openmpi@4.0%gcc@11.4.0", - "prefix": "/usr", - } - ] - }, - "cmake": { - "externals": [{"spec": "cmake@4.0.2", "prefix": "/usr"}], - "buildable": False, - }, - ... - } - } - -4. Add Compiler Definitions +3. Add Compiler Definitions ~~~~~~~~~~~~~~~~~~~~~~~~~~~ We define compilers that are available on our system by implementing @@ -323,7 +264,7 @@ For our AWS system, the compiler we define is ``gcc@11.4.0``. For the ], ) -5. Add a Software Section +4. Add a Software Section ~~~~~~~~~~~~~~~~~~~~~~~~~ Finally, we define the ``compute_software_section()``, where at minimum we must define @@ -351,6 +292,64 @@ have, ``gcc@11.4.0``. } } +5. Add Software Definitions +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Here, we define the ``compute_packages_section()`` function, where you can include any +package that you would like the package manager, such as spack, to find on the system, +meaning it will not build that package from source and use your system package instead. +For each package that you include, you need to define its spec ``name@version`` and the +system path ``prefix`` to the package. Additionally for spack, you need to set +``buildable: False`` to use the package as an external. + +At minimum, we recommend you define externals for ``cmake`` and ``mpi`` (users also +typically define externals for math libraries like ``blas`` and ``lapack``). +Additionally, for systems with accelerators, define externals for CUDA and ROCm runtime +libraries (see an example for a `CUDA system +`_, +and a `ROCm system +`_). +Also, see :ref:`adding-sys-packages`, for help on how to search for the packages +available on your system. + +.. note:: + + For ``mpi``, you need to define ``"mpi": {"buildable": False},`` as a virtual + package, and then define your MPI package as we have for ``openmpi``. + +:: + + from benchpark.directives import maintainers, variant + from benchpark.openmpsystem import OpenMPCPUOnlySystem + from benchpark.paths import hardware_descriptions + from benchpark.system import System + + + class AwsTutorial(System): + + ... + + + def compute_packages_section(self): + return { + "packages": { + "mpi": {"buildable": False}, + "openmpi": { + "externals": [ + { + "spec": "openmpi@4.0%gcc@11.4.0", + "prefix": "/usr", + } + ] + }, + "cmake": { + "externals": [{"spec": "cmake@4.0.2", "prefix": "/usr"}], + "buildable": False, + }, + ... + } + } + 6. Validating the System ~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/systems/aws-tutorial/system.py b/systems/aws-tutorial/system.py index f6aa95b4d..29b883065 100644 --- a/systems/aws-tutorial/system.py +++ b/systems/aws-tutorial/system.py @@ -15,36 +15,24 @@ class AwsTutorial(System): maintainers("stephanielam3211") - common = { - "system_site": "aws", - "scheduler": "flux", - "hardware_key": str(hardware_descriptions) - + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml", - } - id_to_resources = { "c7i.48xlarge": { - **common, "sys_cores_per_node": 192, "sys_mem_per_node_GB": 384, }, "c7i.metal-48xl": { - **common, "sys_cores_per_node": 192, "sys_mem_per_node_GB": 384, }, "c7i.24xlarge": { - **common, "sys_cores_per_node": 96, "sys_mem_per_node_GB": 192, }, "c7i.metal-24xl": { - **common, "sys_cores_per_node": 96, "sys_mem_per_node_GB": 192, }, "c7i.12xlarge": { - **common, "sys_cores_per_node": 48, "sys_mem_per_node_GB": 96, }, @@ -65,7 +53,15 @@ class AwsTutorial(System): def __init__(self, spec): super().__init__(spec) + + # Common attributes across instances self.programming_models = [OpenMPCPUOnlySystem()] + self.system_site = "aws" + self.scheduler = "flux" + self.hardware_key = ( + str(hardware_descriptions) + + "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml" + ) attrs = self.id_to_resources.get(self.spec.variants["instance_type"][0]) for k, v in attrs.items(): From 61be5aafd22d1058c140695c5baa8a20dce4b82f Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 15:51:00 -0700 Subject: [PATCH 21/23] Discuss mandatory vs optional steps --- docs/add-a-system-config.rst | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index ef2c1d01c..c448ba5d1 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -100,12 +100,16 @@ B. Creating the System Definition (``system.py``) Now that you have defined the hardware description for your system, you can now create the ``system.py``, which involves defining the software on your system. This includes defining compilers and pre-installed packages, which your package manager can use -instead of attempting to build the package from scratch. If using Spack, defining as -many external packages as possible here will ensure a much faster build process, and -using system-installed packages may be significantly more performant than building them -from scratch. +instead of attempting to build the package from scratch. The mandatory steps are: -1. Creating the System class +- :ref:`creating-sys-class` +- :ref:`class-init-and-resources` - At least one cluster must be defined. +- :ref:`compiler-def` - At least one compiler must be defined. +- :ref:`software-section` + +.. _creating-sys-class: + +1. Creating the System Class ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this example, we will recreate a fully-functional example of the AWS ``system.py`` @@ -144,6 +148,8 @@ instances that share this same hardware and software specification using the description="AWS instance type", ) +.. _class-init-and-resources: + 2. Specify the Class Initializer and Resources ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -213,6 +219,8 @@ experiment initialized with your chosen instance. for k, v in attrs.items(): setattr(self, k, v) +.. _compiler-def: + 3. Add Compiler Definitions ~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -264,6 +272,8 @@ For our AWS system, the compiler we define is ``gcc@11.4.0``. For the ], ) +.. _software-section: + 4. Add a Software Section ~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -292,6 +302,8 @@ have, ``gcc@11.4.0``. } } +.. _software-definitions: + 5. Add Software Definitions ~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -303,9 +315,11 @@ system path ``prefix`` to the package. Additionally for spack, you need to set ``buildable: False`` to use the package as an external. At minimum, we recommend you define externals for ``cmake`` and ``mpi`` (users also -typically define externals for math libraries like ``blas`` and ``lapack``). -Additionally, for systems with accelerators, define externals for CUDA and ROCm runtime -libraries (see an example for a `CUDA system +typically define externals for math libraries like ``blas`` and ``lapack``). This is +because certain packages (e.g. ``cmake``) can take a long time to build, and packages +such as ``mpi``, ``blas``, and ``lapack`` can influence runtime performance +significantly. Additionally, for systems with accelerators, define externals for CUDA +and ROCm runtime libraries (see an example for a `CUDA system `_, and a `ROCm system `_). @@ -322,7 +336,7 @@ available on your system. from benchpark.directives import maintainers, variant from benchpark.openmpsystem import OpenMPCPUOnlySystem from benchpark.paths import hardware_descriptions - from benchpark.system import System + from benchpark.system import System, compiler_def, compiler_section_for class AwsTutorial(System): From bc91c3d3737672f483baa239fcd82bdc031e3ea2 Mon Sep 17 00:00:00 2001 From: Michael McKinsey Date: Tue, 23 Sep 2025 16:24:39 -0700 Subject: [PATCH 22/23] Full pass --- docs/add-a-system-config.rst | 81 ++++++++++++++++++++---------------- 1 file changed, 44 insertions(+), 37 deletions(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index c448ba5d1..0a1726659 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -27,9 +27,9 @@ To determine if you need to create a new system: already used by Benchpark, the same software stack may already be specified if the same vendor software stack is used on this hardware - or, if a software stack of your datacenter is already specified. If a system exists with the same software stack, add - your system to that ``system.py`` as a value under the ``cluster`` variant, and - specify your systems specific resource configuration under the ``id_to_resources`` - dictionary. + your system to that ``system.py`` as a value under the ``cluster`` variant (may be + under ``instance_type``), and specify your systems specific resource configuration + under the ``id_to_resources`` dictionary. 4. If the same software stack description does not exist, determine if there is one that can be parameterized to match yours, otherwise proceed with adding a new system in :ref:`system-specification`. @@ -40,10 +40,9 @@ A. Adding System Hardware Specs ------------------------------- We list hardware descriptions of Systems specified in Benchpark in the System Catalogue -in :doc:`system-list`. - -If you are running on a system with an accelerator, find an existing system with the -same accelerator vendor, and then secondarily, if you can, match the actual accelerator. +in :doc:`system-list`. If you are running on a system with an accelerator, find an +existing system with the same accelerator vendor, and then secondarily, if you can, +match the actual accelerator. 1. ``accelerator.vendor`` - Company name 2. ``accelerator.name`` - Product name @@ -63,6 +62,11 @@ And add the interconnect vendor and product name. 1. ``interconnect.vendor`` - Company name 2. ``interconnect.name`` - Product name +Finally, match the integrator vendor and name. + +1. ``integrator.vendor`` - Company name +2. ``integrator.name`` - Product name + For example, if your system has an NVIDIA A100 GPU and an Intel x86 Icelake CPUs, a similar config would share the A100 GPU, and CPU architecture may or may not match. Or, if I do not have GPUs and instead have SapphireRapids CPUs, the closest match would be @@ -74,17 +78,17 @@ naming convention: :: - [INTEGRATOR]-MICROARCHITECTURE[-ACCELERATOR][-NETWORK] + [INTEGRATOR][-MICROARCHITECTURE][-ACCELERATOR][-NETWORK] where: :: - INTEGRATOR = COMPANY[_PRODUCTNAME][...] + INTEGRATOR = Integrator Company name MICROARCHITECTURE = CPU Microarchitecture - ACCELERATOR = ACCELERATOR Product Name + ACCELERATOR = Accelerator Product Name NETWORK = Network Product Name @@ -99,8 +103,8 @@ B. Creating the System Definition (``system.py``) Now that you have defined the hardware description for your system, you can now create the ``system.py``, which involves defining the software on your system. This includes -defining compilers and pre-installed packages, which your package manager can use -instead of attempting to build the package from scratch. The mandatory steps are: +defining system resources, compilers, and pre-installed packages. The mandatory steps to +create a ``system.py`` are: - :ref:`creating-sys-class` - :ref:`class-init-and-resources` - At least one cluster must be defined. @@ -153,11 +157,11 @@ instances that share this same hardware and software specification using the 2. Specify the Class Initializer and Resources ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -When defining ``__init__()`` for our system, we invoke the parent +When defining ``__init__()`` for our system, we invoke the parent class ``System::__init__()``, and set important system attributes using the ``id_to_resources`` dictionary, which contains information for each ``cluster`` or -``instance_type``. We define common attributes for all ``instance_type``'s in the -``__init__()`` function: +``instance_type``. We define common attributes a single time for all ``instance_type``'s +inside the ``__init__()`` function: 1. ``system_site`` - The name of the site where the ``cluster``/``instance_type`` is located. @@ -167,7 +171,7 @@ When defining ``__init__()`` for our system, we invoke the parent accelerators, we would add ``CudaSystem`` to this list, and ``ROCmSystem`` for AMD. 3. ``scheduler`` - The job scheduler. 4. ``hardware_key`` - which defines a path to the yaml description you just created in - the previous step. + the previous step :ref:`adding-system-hardware-specs`. 5. ``sys_cores_per_node`` - The amount of hardware cores per node. 6. ``sys_mem_per_node_GB`` - The amount of node memory (in gigabytes). @@ -225,14 +229,15 @@ experiment initialized with your chosen instance. ~~~~~~~~~~~~~~~~~~~~~~~~~~~ We define compilers that are available on our system by implementing -``compute_compilers_section()``: +``compute_compilers_section()`` function. Here are the general steps for how to write +this function, followed by our AWS example: 1. For each compiler, create the necessary config with ``compiler_def()``. 2. For each type of compiler (gcc, intel, etc.), combine them with ``compiler_section_for()``. 3. Merge the compiler definitions with merge_dicts (this part is unnecessary if you have only one type of compiler). -4. Generally you will want to compose a minimal list of compilers: e.g. if you want to +4. Generally, you will want to compose a minimal list of compilers: e.g. if you want to compile your benchmark with the oneAPI compiler, and have multiple versions to choose from, you would add a variant to the system, and the config would expose only one of them. @@ -242,7 +247,8 @@ For our AWS system, the compiler we define is ``gcc@11.4.0``. For the 1. ``spec`` - Similar to package specs, ``name@version``. GCC in particular also needs the ``languages`` variant, where the list of languages depends on the available - ``exes`` (e.g. do not include "fortran" if ``gfortran`` is not available). + ``exes`` (e.g. do not include "fortran" if ``gfortran`` is not available). If you are + **not** using GCC or Spack as your package manager, ``languages`` is unecessary. 2. ``prefix`` - Prefix to the compiler binary directory, e.g. ``/usr/`` for ``/usr/bin/gcc`` 3. ``exes`` - Dictionary to map ``c``, ``cxx``, and ``fortran`` to the appropriate file @@ -277,9 +283,9 @@ For our AWS system, the compiler we define is ``gcc@11.4.0``. For the 4. Add a Software Section ~~~~~~~~~~~~~~~~~~~~~~~~~ -Finally, we define the ``compute_software_section()``, where at minimum we must define -the ``default-compiler`` for Ramble. This is trivial for the single compiler that we -have, ``gcc@11.4.0``. +Here we define the ``compute_software_section()``, where at minimum we must define the +``default-compiler`` for Ramble. This is trivial for the single compiler that we have, +``gcc@11.4.0``. :: @@ -307,21 +313,21 @@ have, ``gcc@11.4.0``. 5. Add Software Definitions ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Here, we define the ``compute_packages_section()`` function, where you can include any -package that you would like the package manager, such as spack, to find on the system, -meaning it will not build that package from source and use your system package instead. -For each package that you include, you need to define its spec ``name@version`` and the -system path ``prefix`` to the package. Additionally for spack, you need to set -``buildable: False`` to use the package as an external. +Finally, we define the ``compute_packages_section()`` function, where you can include +any package that you would like the package manager, such as Spack, to find on the +system, meaning it will not build that package from source and use your system package +instead. For each package that you include, you need to define its spec ``name@version`` +and the system path ``prefix`` to the package. Additionally for Spack, you need to set +``buildable: False`` to tell Spack not to build that package. At minimum, we recommend you define externals for ``cmake`` and ``mpi`` (users also -typically define externals for math libraries like ``blas`` and ``lapack``). This is -because certain packages (e.g. ``cmake``) can take a long time to build, and packages -such as ``mpi``, ``blas``, and ``lapack`` can influence runtime performance -significantly. Additionally, for systems with accelerators, define externals for CUDA -and ROCm runtime libraries (see an example for a `CUDA system +typically define externals for other libraries, e.g. math libraries like ``blas`` and +``lapack``). This is because certain packages (e.g. ``cmake``) can take a long time to +build, and packages such as ``mpi``, ``blas``, and ``lapack`` can influence runtime +performance significantly. Additionally, for systems with accelerators, define externals +for CUDA and ROCm runtime libraries (see externals examples for a `CUDA system `_, -and a `ROCm system +or a `ROCm system `_). Also, see :ref:`adding-sys-packages`, for help on how to search for the packages available on your system. @@ -329,7 +335,8 @@ available on your system. .. note:: For ``mpi``, you need to define ``"mpi": {"buildable": False},`` as a virtual - package, and then define your MPI package as we have for ``openmpi``. + package, and then define your MPI package as we have for the ``openmpi`` package. + This is to ensure Spack uses our MPI, and does not try to build another MPI package. :: @@ -469,5 +476,5 @@ modules: ... prepend_path("PATH","/usr/tce/packages/gcc/gcc-12.1.1/bin") -Therefore, I can add my ``prefix`` as ``/usr/tce/packages/gcc/gcc-12.1.1/`` and my spec -as ``gcc@12.1.1``. +Therefore, the ``prefix`` is ``/usr/tce/packages/gcc/gcc-12.1.1/`` and the spec is +``gcc@12.1.1``. From cc4f8fdda36f3b30cb16cbfe031b841e829814ae Mon Sep 17 00:00:00 2001 From: Stephanie Brink Date: Tue, 23 Sep 2025 22:09:25 -0700 Subject: [PATCH 23/23] fix spelling --- docs/add-a-system-config.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/add-a-system-config.rst b/docs/add-a-system-config.rst index 0a1726659..a19152d93 100644 --- a/docs/add-a-system-config.rst +++ b/docs/add-a-system-config.rst @@ -248,7 +248,7 @@ For our AWS system, the compiler we define is ``gcc@11.4.0``. For the 1. ``spec`` - Similar to package specs, ``name@version``. GCC in particular also needs the ``languages`` variant, where the list of languages depends on the available ``exes`` (e.g. do not include "fortran" if ``gfortran`` is not available). If you are - **not** using GCC or Spack as your package manager, ``languages`` is unecessary. + **not** using GCC or Spack as your package manager, ``languages`` is unnecessary. 2. ``prefix`` - Prefix to the compiler binary directory, e.g. ``/usr/`` for ``/usr/bin/gcc`` 3. ``exes`` - Dictionary to map ``c``, ``cxx``, and ``fortran`` to the appropriate file