Skip to content

Commit b480961

Browse files
committed
Update paper.md
1 parent 3fcc294 commit b480961

File tree

1 file changed

+28
-42
lines changed

1 file changed

+28
-42
lines changed

joss/paper.md

Lines changed: 28 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -37,29 +37,25 @@ changes to their single-node PyLops code.
3737

3838
# Statement of need
3939

40-
As scientific datasets grow and demand higher resolution, need for distributed computing and matrix-free linear algebra becomes essential.
41-
Models and datasets now often exceed a single machine's memory, making efficient, accurate computation challenging. However,
42-
many linear operators in scientific inverse problems can be decomposed into a series of computational blocks that, though
43-
resource-intensive, are well-suited for parallelization, highlighting the need for a distributed approach.
40+
As scientific datasets grow, the need for distributed computing and matrix-free linear algebra becomes crucial.
41+
Models and datasets often exceed a single machines memory, making efficient computation challenging. Many linear operators
42+
in scientific inverse problems can be decomposed into a series of computational blocks that are well-suited for parallelization,
43+
emphasizing the need for a distributed approach.
4444

4545
When addressing distributed inverse problems, we identify three distinct families of problems:
4646

47-
- **1. Fully distributed models and data**: Both model and data are distributed across nodes with minimal communication during
48-
modeling, mainly occurring in the solver for dot products or regularization (e.g., Laplacian). Each node handles a portion
49-
of the model and data when applying the operator and its adjoint.
47+
- **1. Fully distributed models and data**: Both the model and data are split across nodes with minimal communication, mainly
48+
in the solver for dot products or regularization. Each node processes its own portion of the model and data.
5049

51-
- **2. Distributed data, model available on all nodes**: In this case, data is distributed across nodes while the model is
52-
available across all. Communication occurs during the adjoint pass when models produced by each node need
53-
to be summed together, and in the solver for dot products on the data vector.
50+
- **2. Distributed data, model available on all nodes**: Data is distributed across nodes, but the model is available on all.
51+
Communication happens during the adjoint pass to sum models and in the solver for data vector operations.
5452

55-
- **3. Model and data available on all nodes**: Here, communication is limited to the operator, with nodes having identical
56-
copies of the data and model master. All nodes perform computations in the forward and adjoint passes of the operator, requiring
57-
no communication in the solver.
53+
- **3. Model and data available on all nodes**: All nodes have identical copies of the data and model. Communication is limited
54+
to operator calculations, with no communication in solver needed.
5855

5956
MPI for Python (mpi4py [@Dalcin:2021]) provides Python bindings for the MPI standard, allowing applications to leverage multiple
60-
processors across workstations, clusters, and supercomputers. Recent updates (version 3.0 and above) have simplified usage and improved
61-
data communication efficiency between nodes. Projects like mpi4py-fft [@Mortensen:2019], mcdc [@Morgan:2024], and mpi4jax [@mpi4jax] utilize mpi4py
62-
to expand their distributed computing capabilities, improving efficiency and scalability.
57+
processors across workstations, clusters, and supercomputers. Projects like mpi4py-fft [@Mortensen:2019], mcdc [@Morgan:2024], and mpi4jax [@mpi4jax]
58+
utilize mpi4py to expand their distributed computing capabilities, improving efficiency and scalability.
6359

6460
PyLops-MPI, built on top of PyLops [@Ravasi:2020], leverages mpi4py to address large-scale problems in a distributed and parallel manner. It provides an
6561
intuitive API for scattering and broadcasting data and models across nodes, allowing various mathematical operations (e.g., summation, subtraction, norms)
@@ -80,37 +76,27 @@ The main components of the library include:
8076

8177
## DistributedArray
8278

83-
The `pylops_mpi.DistributedArray` class serves as the fundamental array class used throughout the library, enabling
84-
partitioning of large NumPy [@Harris:2020] or CuPy [@cupy] arrays into smaller local arrays distributed across different ranks
85-
and supporting broadcasting these arrays to multiple processes.
86-
87-
The DistributedArray supports two partition types via the **partition** attribute: `Partition.SCATTER` distributes data across all ranks
88-
with user-defined load, while `Partition.BROADCAST` creates a copy of the data for all ranks. Furthermore, various basic mathematical functions
89-
can be applied to DistributedArray objects, including addition (+), subtraction (-), multiplication (*), dot product (@), conjugate (Conj), vector norms, and
90-
deep copying (Copy). Additionally, users can stack `pylops_mpi.DistributedArray` objects using the `pylops_mpi.StackedDistributedArray` class for further mathematical
91-
operations.
92-
93-
## MPILinearOperator and MPIStackedLinearOperator
94-
95-
`pylops_mpi.MPILinearOperator` is the base class for creating MPI linear operators that perform matrix-vector products on DistributedArray objects.
96-
97-
`pylops_mpi.MPIStackedLinearOperator` represents a second level of abstraction in the creation of MPI-powered linear operators; allowing users to stack MPILinearOperator objects,
98-
enabling execution in a distributed manner and supporting matrix-vector products with both DistributedArray and StackedDistributedArray.
79+
The `pylops_mpi.DistributedArray` class serves as the fundamental array class, enabling partitioning of large
80+
NumPy [@Harris:2020] or CuPy [@cupy] arrays into smaller local arrays distributed across different ranks and supporting
81+
broadcasting these arrays to multiple processes. The DistributedArray provides two partition types: `Partition.SCATTER` and
82+
`Partition.BROADCAST`. It also supports basic math operations such as addition (+), multiplication (*), dot-product (@), and
83+
more. Additionally, DistributedArray objects can be stacked using `pylops_mpi.StackedDistributedArray` for further operations.
9984

10085
## HStack, VStack, BlockDiag Operators
10186

102-
One of PyLops' main features is the ability to create combinations of linear operators easily through three main design patterns:
103-
i) horizontal stacking, ii) vertical stacking, and iii) diagonal stacking. PyLops-MPI offers distributed versions of these operators. Specifically,
104-
`pylops_mpi.MPIBlockDiag` enables multiple PyLops operators to run in parallel across different processes, each working on a portion of the model and
105-
data (see family 1). In contrast, `pylops_mpi.MPIVStack` runs multiple operators in parallel on the entire model in forward mode; its adjoint applies to different
106-
portions of the data vector, with individual outputs being sum-reduced (see family 2). Finally, `pylops_mpi.MPIHStack` is the adjoint of `pylops_mpi.MPIVStack`.
87+
`pylops_mpi.MPILinearOperator` and `pylops_mpi.MPIStackedLinearOperator` serve as the foundation for creating new MPI operators.
88+
All existing operators subclass one of these classes.
89+
90+
PyLops enables easy combinations of linear operators via i)horizontal, ii)vertical, and iii)diagonal stacking. PyLops-MPI provides
91+
distributed versions of these, like `pylops_mpi.MPIBlockDiag`, which runs multiple operators in parallel on separate portions of the model
92+
and data (family 1). `pylops_mpi.MPIVStack` applies multiple operators in parallel to the whole model, with its adjoint summing different
93+
parts of the data vector (family 2). `pylops_mpi.MPIHStack` is the adjoint of MPIVStack.
10794

10895
## Halo Exchange
10996

110-
PyLops-MPI Linear Operators typically use halo exchange to transfer portions of the model and data between consecutive ranks. While users should ensure
111-
consistent local data shapes across ranks for matrix-vector products without external communication, this may not always be feasible. When local shapes
112-
differ, the operator performs a halo exchange, transferring boundary data cells (or "ghost cells") to and from neighboring processes. This alignment of
113-
model and data vector shapes at each rank allows efficient local computations without explicit inter-process communication, minimizing communication overhead.
97+
PyLops-MPI Linear Operators use halo exchange to transfer model and data portions between ranks. Users should ensure consistent local data shapes to avoid extra communication during
98+
matrix-vector products. If shapes differ, the operator exchanges boundary data ("ghost cells") between neighboring processes, aligning shapes for efficient local computations
99+
and minimizing overhead.
114100

115101
## MPI-powered Solvers
116102

@@ -123,7 +109,7 @@ We briefly discuss three use cases in geophysical inverse problems that correspo
123109
Specifically:
124110

125111
- *1. Seismic Post-Stack Inversion* represents an effective approach to quantitative characterization of the
126-
subsurface [@Ravasi:2021] from seismic data. In 3D applications, both the model and data are three-dimensional (2 spatial coordinates and depth/time). PyLops-MPI addresses this by
112+
subsurface [@Ravasi:2021] from seismic data. In 3D applications, both the model and data are three-dimensional (2 spatial coordinates and depth/time). PyLops-MPI addresses this by
127113
distributing one spatial axis across different ranks, enabling matrix-vector products and inversions at each rank, which are then gathered to obtain the inverted model.
128114
Communication typically occurs due to the introduction of regularization terms that promote smooth or blocky solutions.
129115

0 commit comments

Comments
 (0)