Skip to content

Commit 5362ed2

Browse files
authored
Merge pull request #167 from tharittk/cuda-aware
Feat: restructuring of communication methods (and buffered communication for CUDA-Aware MPI)
2 parents 489dfa6 + 02efdbb commit 5362ed2

File tree

13 files changed

+799
-232
lines changed

13 files changed

+799
-232
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ lint:
4747
flake8 pylops_mpi/ tests/ examples/ tutorials/
4848

4949
tests:
50-
mpiexec -n $(NUM_PROCESSES) pytest tests/ --with-mpi
50+
export TEST_CUPY_PYLOPS=0 && mpiexec -n $(NUM_PROCESSES) pytest tests/ --with-mpi
5151

5252
# assuming NUM_PROCESSES <= number of gpus available
5353
tests_gpu:

docs/source/gpu.rst

Lines changed: 36 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ This library must be installed *before* PyLops-mpi is installed.
1111

1212
.. note::
1313

14-
Set environment variable ``CUPY_PYLOPS=0`` to force PyLops to ignore the ``cupy`` backend.
14+
Set the environment variable ``CUPY_PYLOPS=0`` to force PyLops to ignore the ``cupy`` backend.
1515
This can be also used if a previous (or faulty) version of ``cupy`` is installed in your system,
1616
otherwise you will get an error when importing PyLops.
1717

@@ -22,6 +22,14 @@ can handle both scenarios. Note that, since most operators in PyLops-mpi are thi
2222
some of the operators in PyLops that lack a GPU implementation cannot be used also in PyLops-mpi when working with
2323
cupy arrays.
2424

25+
.. note::
26+
27+
By default when using ``cupy`` arrays, PyLops-MPI will try to use methods in MPI4Py that communicate memory buffers.
28+
However, this requires a CUDA-Aware MPI installation. If your MPI installation is not CUDA-Aware, set the
29+
environment variable ``PYLOPS_MPI_CUDA_AWARE=0`` to force PyLops-MPI to use methods in MPI4Py that communicate
30+
general Python objects (this will incur a loss of performance!).
31+
32+
2533
Moreover, PyLops-MPI also supports the Nvidia's Collective Communication Library (NCCL) for highly-optimized
2634
collective operations, such as AllReduce, AllGather, etc. This allows PyLops-MPI users to leverage the
2735
proprietary technology like NVLink that might be available in their infrastructure for fast data communication.
@@ -30,13 +38,35 @@ proprietary technology like NVLink that might be available in their infrastructu
3038

3139
Set environment variable ``NCCL_PYLOPS_MPI=0`` to explicitly force PyLops-MPI to ignore the ``NCCL`` backend.
3240
However, this is optional as users may opt-out for NCCL by skip passing `cupy.cuda.nccl.NcclCommunicator` to
33-
the :class:`pylops_mpi.DistributedArray`
41+
the :class:`pylops_mpi.DistributedArray`.
42+
43+
In summary:
44+
45+
.. list-table::
46+
:widths: 50 25 25
47+
:header-rows: 1
48+
49+
* - Operation model
50+
- Enabled with
51+
- Disabled with
52+
* - NumPy + MPI
53+
- Default
54+
- Cannot be disabled
55+
* - CuPy + MPI
56+
- ``PYLOPS_MPI_CUDA_AWARE=0``
57+
- ``PYLOPS_MPI_CUDA_AWARE=1`` (default)
58+
* - CuPy + CUDA-Aware MPI
59+
- ``PYLOPS_MPI_CUDA_AWARE=1`` (default)
60+
- ``PYLOPS_MPI_CUDA_AWARE=0``
61+
* - CuPy + NCCL
62+
- ``NCCL_PYLOPS_MPI=1`` (default)
63+
- ``NCCL_PYLOPS_MPI=0``
3464

3565
Example
3666
-------
3767

3868
Finally, let's briefly look at an example. First we write a code snippet using
39-
``numpy`` arrays which PyLops-mpi will run on your CPU:
69+
``numpy`` arrays which PyLops-MPI will run on your CPU:
4070

4171
.. code-block:: python
4272
@@ -157,6 +187,8 @@ GPU+MPI, and GPU+NCCL):
157187
- ✅
158188
- ✅
159189
- ✅
190+
- ✅
191+
- ✅
160192
* - :class:`pylops_mpi.basicoperators.MPISecondDerivative`
161193
- ✅
162194
- ✅
@@ -184,4 +216,4 @@ GPU+MPI, and GPU+NCCL):
184216
* - :class:`pylops_mpi.optimization.basic.cgls`
185217
- ✅
186218
- ✅
187-
- ✅
219+
- ✅

docs/source/installation.rst

Lines changed: 38 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,13 @@ The minimal set of dependencies for the PyLops-MPI project is:
1515
* `MPI4py <https://mpi4py.readthedocs.io/en/stable/>`_
1616
* `PyLops <https://pylops.readthedocs.io/en/stable/>`_
1717

18-
Additionally, to use the NCCL engine, the following additional
18+
Additionally, to use the CUDA-aware MPI engine, the following additional
19+
dependencies are required:
20+
21+
* `CuPy <https://cupy.dev/>`_
22+
* CUDA-aware MPI
23+
24+
Similarly, to use the NCCL engine, the following additional
1925
dependencies are required:
2026

2127
* `CuPy <https://cupy.dev/>`_
@@ -27,12 +33,18 @@ if this is not possible, some of the dependencies must be installed prior to ins
2733

2834
Download and Install MPI
2935
========================
30-
Visit the official MPI website to download an appropriate MPI implementation for your system.
31-
Follow the installation instructions provided by the MPI vendor.
36+
Visit the official website of your MPI vendor of choice to download an appropriate MPI
37+
implementation for your system:
38+
39+
* `Open MPI <https://docs.open-mpi.org/>`_
40+
* `MPICH <https://www.mpich.org/>`_
41+
* `Intel MPI <https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html>`_
42+
* ...
3243

33-
* `Open MPI <https://www.open-mpi.org/software/ompi/v1.10/>`_
34-
* `MPICH <https://www.mpich.org/downloads/>`_
35-
* `Intel MPI <https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html#gs.10j8fx>`_
44+
Alternatively, the conda-forge community provides ready-to-use binary packages for four MPI implementations
45+
(see `MPI4Py documentation <https://mpi4py.readthedocs.io/en/stable/install.html#conda-packages>`_ for more
46+
details). In this case, you can defer the installation to the stage when the conda environment for your project
47+
is created - see below for more details.
3648

3749
Verify MPI Installation
3850
=======================
@@ -42,6 +54,17 @@ After installing MPI, verify its installation by opening a terminal and running
4254
4355
>> mpiexec --version
4456
57+
Install CUDA-Aware MPI (optional)
58+
=================================
59+
To be able to achieve the best performance when using PyLops-MPI with CuPy arrays, a CUDA-Aware version of
60+
MPI must be installed.
61+
62+
For `Open MPI`, the conda-forge package has built-in CUDA support, as long as a pre-installed CUDA is detected.
63+
Run the following `commands <https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/cuda.html#how-do-i-verify-that-open-mpi-has-been-built-with-cuda-support>`_
64+
for diagnostics.
65+
66+
For the other MPI implementations, refer to their specific documentation.
67+
4568
Install NCCL (optional)
4669
=======================
4770
To obtain highly-optimized performance on GPU clusters, PyLops-MPI also supports the Nvidia's collective communication calls
@@ -103,6 +126,15 @@ For a ``conda`` environment, run
103126
This will create and activate an environment called ``pylops_mpi``, with all
104127
required and optional dependencies.
105128

129+
If you want to also install MPI as part of the creation process of the conda environment,
130+
modify the ``environment-dev.yml`` file by adding ``openmpi``\``mpich`\``impi_rt``\``msmpi``
131+
just above ``mpi4py``. Note that only ``openmpi`` provides a CUDA-Aware MPI installation.
132+
133+
If you want to leverage CUDA-Aware MPI but prefer to use another MPI installation, you must
134+
either switch to a `Pip`-based installation (see below), or move ``mpi4py`` into the ``pip``
135+
section of the ``environment-dev.yml`` file and export the variable ``MPICC`` pointing to
136+
the path of your CUDA-Aware MPI installation.
137+
106138
If you want to enable `NCCL <https://developer.nvidia.com/nccl>`_ in PyLops-MPI, run this instead
107139

108140
.. code-block:: bash

0 commit comments

Comments
 (0)