You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: joss/paper.md
+28-42Lines changed: 28 additions & 42 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,29 +37,25 @@ changes to their single-node PyLops code.
37
37
38
38
# Statement of need
39
39
40
-
As scientific datasets grow and demand higher resolution, need for distributed computing and matrix-free linear algebra becomes essential.
41
-
Models and datasets now often exceed a single machine's memory, making efficient, accurate computation challenging. However,
42
-
many linear operators in scientific inverse problems can be decomposed into a series of computational blocks that, though
43
-
resource-intensive, are well-suited for parallelization, highlighting the need for a distributed approach.
40
+
As scientific datasets grow, the need for distributed computing and matrix-free linear algebra becomes crucial.
41
+
Models and datasets often exceed a single machine’s memory, making efficientcomputation challenging. Many linear operators
42
+
in scientific inverse problems can be decomposed into a series of computational blocks that are well-suited for parallelization,
43
+
emphasizing the need for a distributed approach.
44
44
45
45
When addressing distributed inverse problems, we identify three distinct families of problems:
46
46
47
-
-**1. Fully distributed models and data**: Both model and data are distributed across nodes with minimal communication during
48
-
modeling, mainly occurring in the solver for dot products or regularization (e.g., Laplacian). Each node handles a portion
49
-
of the model and data when applying the operator and its adjoint.
47
+
-**1. Fully distributed models and data**: Both the model and data are split across nodes with minimal communication, mainly
48
+
in the solver for dot products or regularization. Each node processes its own portion of the model and data.
50
49
51
-
-**2. Distributed data, model available on all nodes**: In this case, data is distributed across nodes while the model is
52
-
available across all. Communication occurs during the adjoint pass when models produced by each node need
53
-
to be summed together, and in the solver for dot products on the data vector.
50
+
-**2. Distributed data, model available on all nodes**: Data is distributed across nodes, but the model is available on all.
51
+
Communication happens during the adjoint pass to sum models and in the solver for data vector operations.
54
52
55
-
-**3. Model and data available on all nodes**: Here, communication is limited to the operator, with nodes having identical
56
-
copies of the data and model master. All nodes perform computations in the forward and adjoint passes of the operator, requiring
57
-
no communication in the solver.
53
+
-**3. Model and data available on all nodes**: All nodes have identical copies of the data and model. Communication is limited
54
+
to operator calculations, with no communication in solver needed.
58
55
59
56
MPI for Python (mpi4py [@Dalcin:2021]) provides Python bindings for the MPI standard, allowing applications to leverage multiple
60
-
processors across workstations, clusters, and supercomputers. Recent updates (version 3.0 and above) have simplified usage and improved
61
-
data communication efficiency between nodes. Projects like mpi4py-fft [@Mortensen:2019], mcdc [@Morgan:2024], and mpi4jax [@mpi4jax] utilize mpi4py
62
-
to expand their distributed computing capabilities, improving efficiency and scalability.
57
+
processors across workstations, clusters, and supercomputers. Projects like mpi4py-fft [@Mortensen:2019], mcdc [@Morgan:2024], and mpi4jax [@mpi4jax]
58
+
utilize mpi4py to expand their distributed computing capabilities, improving efficiency and scalability.
63
59
64
60
PyLops-MPI, built on top of PyLops [@Ravasi:2020], leverages mpi4py to address large-scale problems in a distributed and parallel manner. It provides an
65
61
intuitive API for scattering and broadcasting data and models across nodes, allowing various mathematical operations (e.g., summation, subtraction, norms)
@@ -80,37 +76,27 @@ The main components of the library include:
80
76
81
77
## DistributedArray
82
78
83
-
The `pylops_mpi.DistributedArray` class serves as the fundamental array class used throughout the library, enabling
84
-
partitioning of large NumPy [@Harris:2020] or CuPy [@cupy] arrays into smaller local arrays distributed across different ranks
85
-
and supporting broadcasting these arrays to multiple processes.
86
-
87
-
The DistributedArray supports two partition types via the **partition** attribute: `Partition.SCATTER` distributes data across all ranks
88
-
with user-defined load, while `Partition.BROADCAST` creates a copy of the data for all ranks. Furthermore, various basic mathematical functions
89
-
can be applied to DistributedArray objects, including addition (+), subtraction (-), multiplication (*), dot product (@), conjugate (Conj), vector norms, and
90
-
deep copying (Copy). Additionally, users can stack `pylops_mpi.DistributedArray` objects using the `pylops_mpi.StackedDistributedArray` class for further mathematical
91
-
operations.
92
-
93
-
## MPILinearOperator and MPIStackedLinearOperator
94
-
95
-
`pylops_mpi.MPILinearOperator` is the base class for creating MPI linear operators that perform matrix-vector products on DistributedArray objects.
96
-
97
-
`pylops_mpi.MPIStackedLinearOperator` represents a second level of abstraction in the creation of MPI-powered linear operators; allowing users to stack MPILinearOperator objects,
98
-
enabling execution in a distributed manner and supporting matrix-vector products with both DistributedArray and StackedDistributedArray.
79
+
The `pylops_mpi.DistributedArray` class serves as the fundamental array class, enabling partitioning of large
80
+
NumPy [@Harris:2020] or CuPy [@cupy] arrays into smaller local arrays distributed across different ranks and supporting
81
+
broadcasting these arrays to multiple processes. The DistributedArray provides two partition types: `Partition.SCATTER` and
82
+
`Partition.BROADCAST`. It also supports basic math operations such as addition (+), multiplication (*), dot-product (@), and
83
+
more. Additionally, DistributedArray objects can be stacked using `pylops_mpi.StackedDistributedArray` for further operations.
99
84
100
85
## HStack, VStack, BlockDiag Operators
101
86
102
-
One of PyLops' main features is the ability to create combinations of linear operators easily through three main design patterns:
103
-
i) horizontal stacking, ii) vertical stacking, and iii) diagonal stacking. PyLops-MPI offers distributed versions of these operators. Specifically,
104
-
`pylops_mpi.MPIBlockDiag` enables multiple PyLops operators to run in parallel across different processes, each working on a portion of the model and
105
-
data (see family 1). In contrast, `pylops_mpi.MPIVStack` runs multiple operators in parallel on the entire model in forward mode; its adjoint applies to different
106
-
portions of the data vector, with individual outputs being sum-reduced (see family 2). Finally, `pylops_mpi.MPIHStack` is the adjoint of `pylops_mpi.MPIVStack`.
87
+
`pylops_mpi.MPILinearOperator` and `pylops_mpi.MPIStackedLinearOperator` serve as the foundation for creating new MPI operators.
88
+
All existing operators subclass one of these classes.
89
+
90
+
PyLops enables easy combinations of linear operators via i)horizontal, ii)vertical, and iii)diagonal stacking. PyLops-MPI provides
91
+
distributed versions of these, like `pylops_mpi.MPIBlockDiag`, which runs multiple operators in parallel on separate portions of the model
92
+
and data (family 1). `pylops_mpi.MPIVStack` applies multiple operators in parallel to the whole model, with its adjoint summing different
93
+
parts of the data vector (family 2). `pylops_mpi.MPIHStack` is the adjoint of MPIVStack.
107
94
108
95
## Halo Exchange
109
96
110
-
PyLops-MPI Linear Operators typically use halo exchange to transfer portions of the model and data between consecutive ranks. While users should ensure
111
-
consistent local data shapes across ranks for matrix-vector products without external communication, this may not always be feasible. When local shapes
112
-
differ, the operator performs a halo exchange, transferring boundary data cells (or "ghost cells") to and from neighboring processes. This alignment of
113
-
model and data vector shapes at each rank allows efficient local computations without explicit inter-process communication, minimizing communication overhead.
97
+
PyLops-MPI Linear Operators use halo exchange to transfer model and data portions between ranks. Users should ensure consistent local data shapes to avoid extra communication during
98
+
matrix-vector products. If shapes differ, the operator exchanges boundary data ("ghost cells") between neighboring processes, aligning shapes for efficient local computations
99
+
and minimizing overhead.
114
100
115
101
## MPI-powered Solvers
116
102
@@ -123,7 +109,7 @@ We briefly discuss three use cases in geophysical inverse problems that correspo
123
109
Specifically:
124
110
125
111
-*1. Seismic Post-Stack Inversion* represents an effective approach to quantitative characterization of the
126
-
subsurface [@Ravasi:2021] from seismic data. In 3D applications, both the model and data are three-dimensional (2 spatial coordinates and depth/time). PyLops-MPI addresses this by
112
+
subsurface [@Ravasi:2021] from seismic data. In 3D applications, both the model and data are three-dimensional (2 spatial coordinates and depth/time). PyLops-MPI addresses this by
127
113
distributing one spatial axis across different ranks, enabling matrix-vector products and inversions at each rank, which are then gathered to obtain the inverted model.
128
114
Communication typically occurs due to the introduction of regularization terms that promote smooth or blocky solutions.
0 commit comments