You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Made changes to paper.md
* Update paper.md
* More cleanup
* Small changes to bib file
* Minor changes to paper.md
---------
Co-authored-by: mrava87 <matteoravasi@gmail.com>
- name: Computer Science and Engineering, Cluster Innovation Center, University of Delhi, Delhi, India.
18
+
- name: Cluster Innovation Center, University of Delhi, Delhi, India.
19
19
index: 1
20
20
- name: Earth Science and Engineering, Physical Sciences and Engineering (PSE), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia.
21
21
index: 2
@@ -27,149 +27,83 @@ bibliography: paper.bib
27
27
28
28
# Summary
29
29
30
-
Linear algebra operations and inverse problems represent the cornerstone of numerous algorithms in fields such as image
31
-
processing, geophysics, signal processing, and remote sensing. This paper presents PyLops-MPI, an extension of the PyLops
32
-
framework, specifically designed to enable distributed computing in the solution of large-scale inverse problems in Python.
33
-
PyLops-MPI provides functionalities to parallelize any combination of PyLops operators with different reduction patterns, drop-in
34
-
replacements for several PyLops operator that require changes in their inner working to be amenable to distributed computing,
35
-
and distributed solvers. By leveraging the Message Passing Interface (MPI) standard, the presented library can effectively unleash
36
-
the computational power of multiple nodes (or ranks), enabling users to efficiently scale their inversion problems with minimal
37
-
code modifications compared to their single-node equivalent PyLops codes.
30
+
Linear algebra and inverse problem theory are fundamental to many algorithms in signal processing, image processing, geophysics, and remote sensing. This paper introduces PyLops-MPI, an extension of the PyLops framework designed to enable distributed computing in the solution of large-scale inverse problems in Python. By leveraging the Message Passing Interface (MPI) standard, this library effectively
31
+
harnesses the computational power of multiple nodes, allowing users to scale their inverse problems efficiently with minimal
32
+
changes compared to their single-node PyLops code.
38
33
39
34
# Statement of need
40
35
41
-
As scientific datasets grow and the demand for higher resolution increases, the need for distributed computing alongside
42
-
matrix-free linear algebra becomes more critical. Nowadays, it is in fact common for the size of models and datasets to exceed
43
-
the memory capacity of a single machine—making it difficult to perform computations efficiently and accurately at the same time.
44
-
However, many linear operators used in scientific inverse problems can be usually decomposed in a series of computational blocks that, although being resource-intensive, can be effectively parallelized; this further emphasizes the necessity for a distributed computing approach to inverse problems.
36
+
As scientific datasets grow and the demand for higher resolution increases, the use of distributed computing in matrix-free linear algebra becomes crucial.
37
+
Models and datasets can in fact easily exceed the memory of a single machine—making it difficult to perform computations efficiently and accurately at the same time. Nevertheless, many linear operators
38
+
in scientific inverse problems can be decomposed into a series of computational blocks that are well-suited for parallelization.
45
39
46
40
When addressing distributed inverse problems, we identify three distinct families of problems:
47
41
48
-
-**1. Fully distributed models and data**: both model and data are distributed across nodes, with minimal
49
-
communication during the modeling process. Communication mainly occurs in the solver when dot
50
-
products or regularization terms (i.e., Laplacian) are applied. In this scenario, each node can easily
51
-
handle a portion of the model and data when applying the modelling operator and its adjoint.
42
+
-**1. Fully distributed models and data**: Both model and data are split across nodes, with each node processing its own portion of the model and data. This leads to minimal
43
+
communication, mainly when performing dot products in the solver or in the regularization terms.
52
44
53
-
-**2. Distributed data, model available on all nodes**: in this case, data is distributed across nodes while the model is
54
-
available on all nodes. Communication is required during the adjoint pass when models produced by each node need
55
-
to be summed together, and in the solver when performing dot products on the data vector.
45
+
-**2. Distributed data, model available on all nodes**: Data is distributed across nodes, whilst the model is available on all nodes.
46
+
Communication happens during the adjoint pass to sum models and in the solver for data vector operations.
56
47
57
-
-**3. Model and data available on all nodes**: here, communication is confined to the operator, with nodes possessing the same copies
58
-
of data and model master. All nodes then perform certain computations in the forward and adjoint pass of the operator and no
59
-
communication is required in the solver.
48
+
-**3. Model and data available on all nodes**: All nodes have identical copies of the data and model. Communication only happens within
49
+
the operator, with no communication in solver needed.
60
50
61
-
MPI for Python (also known as mpi4py [@Dalcin:2021] provides Python bindings for the MPI standard, and allows Python applications to exploit the power of
62
-
multiple processors on workstations, clusters and supercomputers. Recent updates to mpi4py (version 3.0 and above) have simplified its usage, enabling more efficient data communication between nodes and processes. Some projects in the Python ecosystem, such as
63
-
mpi4py-fft [@Mortensen:2019], mcdc [@Morgan:2024], and mpi4jax [@mpi4jax], utilize mpi4py to extend their capabilities to distributed computing
64
-
ultimately improving the efficiency and scalability of the exposed operations.
65
-
66
-
PyLops-MPI is built on top of PyLops [@Ravasi:2020] and utilizes mpi4py to enable the solution of
67
-
large scale problems in a distributed and parallelized manner. PyLops-MPI offers an intuitive API that allows users to
68
-
easily scatter and broadcast data and models across different nodes or processors, enabling to perform various mathematical operations on them (e.g., summmation, subtraction, norms) in a distributed manner. Moreover, it provides a suite of MPI-powered linear operators and linear solver and is designed in a flexible way, allowing any user to easily add custom operators and solvers tailored to their specific needs.
69
-
70
-
In summary, what sets PyLops-MPI apart from other libraries is its ease of use in creating MPI Operators, facilitating efficient
71
-
integration between mpi4py and PyLops. This enables users to solve large-scale, complex inverse problems without the
72
-
risk of data leaks or the need to manage MPI requirements themselves.
51
+
MPI for Python (mpi4py [@Dalcin:2021]) provides Python bindings for the MPI standard, allowing applications to leverage multiple
52
+
processors. Projects like mpi4py-fft [@Mortensen:2019], mcdc [@Morgan:2024], and mpi4jax [@mpi4jax]
53
+
utilize mpi4py to provide distributed computing capabilities. Similarly, PyLops-MPI, which is built on top of PyLops [@Ravasi:2020] leverages mpi4py to solve large-scale problems in a distributed fashion.
54
+
Its intuitive API provide functionalities to scatter and broadcast data and model vector across nodes and allows various mathematical operations (e.g., summation, subtraction, norms)
55
+
to be performed. Additionally, a suite of MPI-powered linear operators and solvers is offered, and its flexible design eases the integration of custom operators and solvers.
56
+
PyLops-MPI enables users to solve complex inverse problems without concerns about data leaks or MPI management.
73
57
74
58
# Software Framework
75
59
76
-
PyLops-MPI introduces MPI support to PyLops by providing an efficient API for solving linear problems through
77
-
parallelization using the mpi4py library. This library is designed to tackle large-scale linear inverse problems that
78
-
are difficult to solve using a single process (due to either extremely high computational cost of memory requirements).
60
+
PyLops-MPI is designed to tackle large-scale linear inverse problems that are difficult to solve using a single process
61
+
(due to either extremely high computational cost or memory requirements).
79
62
80
63

81
64
82
-
The main components of the library include:
65
+
The main components of the library are:
83
66
84
67
## DistributedArray
85
68
86
-
The `pylops_mpi.DistributedArray` class serves as the fundamental array class used throughout the library. It enables
87
-
the partitioning of large NumPy [@Harris:2020] or CuPy [@cupy] arrays into smaller local arrays, which can
88
-
be distributed across different ranks. Additionally, it allows for broadcasting the NumPy or CuPy array to multiple processes.
89
-
90
-
The DistributedArray supports two types of partitions through the **partition** attribute: `Partition.SCATTER`
91
-
distributes the data across all ranks, allowing users to specify how much load each rank should handle, while `Partition.BROADCAST`
92
-
creates a copy of the data and distributes it to all ranks, ensuring that the data is replicated on each rank and kept consistent
93
-
across the entire duration of a code.
94
-
95
-
Furthermore, various basic mathematical functions can be applied to one or more DistributedArray objects:
96
-
97
-
- Add (+) / Subtract (-): Adds or subtracts two DistributedArrays.
98
-
- Multiply (*): Multiplies two DistributedArrays.
99
-
- Dot-product (@): Calculates the dot product by flattening the arrays, resulting in a scalar value.
100
-
- Conj: Computes the conjugate of the DistributedArray.
101
-
- Norms: Calculates the vector norm along any specified axis.
102
-
- Copy: Creates a deep copy of the DistributedArray.
103
-
104
-
PyLops-MPI also provides a way to stack a series of `pylops_mpi.DistributedArray` objects using the
105
-
`pylops_mpi.StackedDistributedArray` class and perform mathematical operations on them.
106
-
107
-
## MPILinearOperator
108
-
109
-
`pylops_mpi.MPILinearOperator` is the base class for all MPI linear operators, allowing users to create new operators
110
-
performing matrix-vector products on `pylops_mpi.DistributedArray` objects, which can be later coupled with any PyLops-MPI linear solver. To create a
111
-
new MPILinearOperator, users need to subclass the `pylops_mpi.MPILinearOperator` parent class and specify the **shape** and **dtype**.
112
-
The **_matvec** and **_rmatvec** methods should also be implemented for the forward and adjoint operations.
113
-
114
-
## MPIStackedLinearOperator
115
-
116
-
`pylops_mpi.MPIStackedLinearOperator` represents a second level of abstraction in the creation of MPI-powered linear operators; it
117
-
allows users to create a stack of MPILinearOperator objects, where the different operators are invoked sequentially but each operator runs
118
-
in a distributed fashion. The `pylops_mpi.MPIStackedLinearOperator` has the ability to perform matrix-vector products with
119
-
both `pylops_mpi.DistributedArray` and `pylops_mpi.StackedDistributedArray`. Similar to `pylops_mpi.MPILinearOperator`,
120
-
users need to subclass the `pylops_mpi.MPIStackedLinearOperator` parent class and specify the **shape** and **dtype**.
121
-
The **_matvec** method should be implemented for the forward pass, while the **_rmatvec** method should be used for
122
-
the adjoint pass.
69
+
The `pylops_mpi.DistributedArray` class serves as the fundamental array class, enabling both partitioning and broadcasting of large
70
+
NumPy [@Harris:2020] or CuPy [@cupy] arrays across multiple processes. It also supports basic math operations such as addition (+), multiplication (*), dot-product (@). Finally, multiple DistributedArray objects can be stacked using `pylops_mpi.StackedDistributedArray` for further operations.
123
71
124
72
## HStack, VStack, BlockDiag Operators
125
73
126
-
One of the main features of PyLops is the ability to create any combination of linear operators in a simple and expressive manner. Three key
127
-
design patterns to combine linear operators are: i) horizontal stacking, ii) vertical stacking, iii) diagonal stacking. PyLops-MPI follows the
128
-
same rationale and provides distributed versions of such operators. More specifically, `pylops_mpi.MPIBlockDiag` allows one to run multiple
129
-
PyLops operators in parallel over different processes, each acting on a portion of the model and data (see family 1). On the other hand,
130
-
`pylops_mpi.MPIVStack` allows one to run multiple PyLops operators in parallel, each acting on the entire model in forward mode;
131
-
the adjoint of such operators is instead applied to different portions of the data vector and the individual outputs are then sum-reduced (see family 2).
132
-
Finally, `pylops_mpi.MPIHStack` is simply the adjoint of `pylops_mpi.MPIVStack`.
74
+
PyLops facilitates the combinations of multiple linear operators via horizontal, vertical, or diagonal stacking. PyLops-MPI provides
75
+
distributed versions of these operations-e.g., `pylops_mpi.MPIBlockDiag` applies different operators in parallel on separate portions of the model
76
+
and data. `pylops_mpi.MPIVStack` applies multiple operators in parallel to the whole model, with its adjoint applies the adjoint of each individual operator to portions of the data vector and sums the individual output. `pylops_mpi.MPIHStack` is the adjoint of MPIVStack.
133
77
134
78
## Halo Exchange
135
79
136
-
PyLops-MPI Linear Operators typically utilize halo exchange to facilitate interchange of portions of the model/data betweem consecutive ranks.
137
-
Whilst users are encouraged toensure that the local data shapes at each rank are consistent to enable matrix-vector products without requiring
138
-
external communication, in some cases this is not possible; instead, if the local shapes of the data and model at each rank do not match during these
139
-
operations, the operator itself is tasked to perform a halo exchange, transferring boundary data cells (commonly referred to as "ghost
140
-
cells") to/from neighboring processes. This ensures that the model vector and data vector shapes at each rank
141
-
are correctly aligned for the operation. Consequently, this data transfer enables efficient local computations without
142
-
the need for explicit inter-process communication, thereby avoiding heavy communication overhead.
80
+
PyLops-MPI uses halo exchange to transfer portions of the model and data between ranks. Users should ensure consistent local data shapes to avoid extra communication during matrix-vector products. If shapes differ, the operator exchanges boundary data ("ghost cells") between neighboring processes, aligning shapes to enablr efficient local computations and minimize overhead.
143
81
144
82
## MPI-powered Solvers
145
83
146
-
PyLops-MPI offers a small subset of PyLops linear solvers, which can deal with **DistributedArray** and **StackedDistributedArray** objects. Internally, the solvers
147
-
use the different mathematical operations implemented for such classes alongside calling the forward and adjoint passes of the operator itself. The
148
-
currently available solvers can be found within the submodule `pylops_mpi.optimization`.
84
+
PyLops-MPI offers a small subset of PyLops linear solvers, which can deal with **DistributedArray** and **StackedDistributedArray** objects. These solvers utilize
85
+
the mathematical operations implemented in these classes and call the operator's forward and adjoint passes.
149
86
150
87
# Use Cases
151
88
152
-
In the following, we briefly discuss three different use cases in geophysical inverse problems that fall within the 3 different families of problems previously
153
-
discussed. More specifically:
89
+
Here we present three use cases in geophysical inverse problems that correspond to the previously mentioned families of problems:
90
+
91
+
-*Seismic Post-Stack Inversion* can be used to characterize the
92
+
subsurface [@Ravasi:2021] from seismic data. In 3D applications, when both the model and data are three-dimensional arrays, PyLops-MPI distibutes one spatial axis across different ranks. Each rank therefore processes a subset of the entire data and model.
93
+
Communication occurs due to the introduction of regularization terms that promote smooth or blocky solutions.
94
+
95
+
-*Least-Squares Migration (LSM)* explains seismic data via a Born modelling engine to produce high-resolution images of the subsurface
96
+
reflectivity. [@Nemeth:1999]. PyLops-MPI distributes the available sources across different MPI ranks, and
97
+
each rank applies the Born modeling operator for a subset of sources with the broadcasted reflectivity.
154
98
155
-
-*1. Seismic Post-Stack Inversion* represents an effective approach to quantitative characterization of the
156
-
subsurface [@Ravasi:2021] from seismic data. In 3D applications, both the model and data
157
-
are three-dimensional (2 spatial coordinates and depth/time). PyLops-MPI
158
-
solves this problem by distributing one of the spatial axes across different ranks, allowing matrix-vector products and
159
-
inversions to take place at each rank, which are later gathered to obtain the inverted model. Usually communication happens
160
-
because of the introducing of regularization terms that ensure the solution to be smooth or blocky.
99
+
-*Multi-Dimensional Deconvolution (MDD)* is a powerful technique used to estimate seismic datasets without overburden effects [@Ravasi:2022].
100
+
PyLops-MPI tackles this large-scale inverse problem by distributing the kernel of the
101
+
Multi-Dimensional Deconvolution operator across ranks, allowing each process to apply a batched matrix-vector multiplication to a portion of the input model and data.
161
102
162
-
-*2. Least-Squares Migration (LSM)* is the process of explaining seismic data via a Born modelling engine to produce an image of the subsurface
163
-
reflectivity [@Nemeth:1999]. PyLops-MPI tackles this problem by distributing all of the available sources across different MPI ranks.
164
-
Each rank applies the expensive Born modeling operator for a subset of sources with the broadcasted reflectivity.
165
-
The resulting data is therefore scattered across the different ranks, and inversion is again perfromed using one of MPI-Powered solvers to produce the desired subsurface reflectivity image.
103
+
As similar patterns are likely to emerge in inverse problems across other disciplines, we expect a broad adoption of the PyLops-MPI framework in other scientific fields.
166
104
167
-
-*3. Multi-Dimensional Deconvolution (MDD)* is a powerful technique used at various stages of the seismic processing
168
-
value chain to create datasets deprived of overburden effects [@Ravasi:2022]. PyLops-MPI addresses this large-scale inverse problem by
169
-
splitting the kernel of the so-called Multi-Dimensional Deconvolution (MDC) operator across ranks, such that each process can perform a portion of the
170
-
batched matrix-vector (or matrix-matrix) multiplication required by such an operator. Here, both the model and data are available on all ranks for the entire inverse process.
105
+
# Acknowledgements
171
106
172
-
Finally, we anticipate that similar patterns can be found in many other inverse problems in different disciplines and therefore we foresee a wide adoption of
173
-
the PyLops-MPI frameworks in other scientific fields.
107
+
The PyLops team acknowledges the support from Google Summer of Code and the NumFOCUS organization, which have been vital to the development of PyLops-MPI.
0 commit comments