Skip to content

Commit ee22336

Browse files
Made changes to paper.md (#109)
* Made changes to paper.md * Update paper.md * More cleanup * Small changes to bib file * Minor changes to paper.md --------- Co-authored-by: mrava87 <matteoravasi@gmail.com>
1 parent 830fc08 commit ee22336

File tree

2 files changed

+48
-114
lines changed

2 files changed

+48
-114
lines changed

joss/paper.bib

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
@article{Ravasi:2020,
2-
author = {M. Ravasi and I. Vasconcelos},
2+
author = {Ravasi, Matteo and Vasconcelos, Ivan},
33
title = "{PyLops - A linear-operator Python library for scalable algebra and optimization}",
44
journal = {SoftwareX},
55
year = 2020,
@@ -28,7 +28,7 @@ @article{Mortensen:2019
2828
volume = {4},
2929
number = {36},
3030
pages = {1340},
31-
author = {Mikael Mortensen and Lisandro Dalcin and David Elliot Keyes},
31+
author = {Mortensen, Mikael and Dalcin, Lisandro and Keyes, David Elliot},
3232
title = {mpi4py-fft: Parallel Fast Fourier Transforms with MPI for Python},
3333
journal = {Journal of Open Source Software}
3434
}
@@ -128,7 +128,7 @@ @article{Ravasi:2022
128128
DOI={10.1109/tgrs.2022.3179626},
129129
journal={IEEE Transactions on Geoscience and Remote Sensing},
130130
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
131-
author={Ravasi, Matteo and Selvan, Tamin and Luiken, Nick},
131+
author={Ravasi, Matteo and Selvan, Tamil and Luiken, Nick},
132132
year={2022},
133133
pages={1–14}
134134
}

joss/paper.md

Lines changed: 45 additions & 111 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ authors:
1515
orcid: 0000-0002-0741-6602
1616
affiliation: 3
1717
affiliations:
18-
- name: Computer Science and Engineering, Cluster Innovation Center, University of Delhi, Delhi, India.
18+
- name: Cluster Innovation Center, University of Delhi, Delhi, India.
1919
index: 1
2020
- name: Earth Science and Engineering, Physical Sciences and Engineering (PSE), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia.
2121
index: 2
@@ -27,149 +27,83 @@ bibliography: paper.bib
2727

2828
# Summary
2929

30-
Linear algebra operations and inverse problems represent the cornerstone of numerous algorithms in fields such as image
31-
processing, geophysics, signal processing, and remote sensing. This paper presents PyLops-MPI, an extension of the PyLops
32-
framework, specifically designed to enable distributed computing in the solution of large-scale inverse problems in Python.
33-
PyLops-MPI provides functionalities to parallelize any combination of PyLops operators with different reduction patterns, drop-in
34-
replacements for several PyLops operator that require changes in their inner working to be amenable to distributed computing,
35-
and distributed solvers. By leveraging the Message Passing Interface (MPI) standard, the presented library can effectively unleash
36-
the computational power of multiple nodes (or ranks), enabling users to efficiently scale their inversion problems with minimal
37-
code modifications compared to their single-node equivalent PyLops codes.
30+
Linear algebra and inverse problem theory are fundamental to many algorithms in signal processing, image processing, geophysics, and remote sensing. This paper introduces PyLops-MPI, an extension of the PyLops framework designed to enable distributed computing in the solution of large-scale inverse problems in Python. By leveraging the Message Passing Interface (MPI) standard, this library effectively
31+
harnesses the computational power of multiple nodes, allowing users to scale their inverse problems efficiently with minimal
32+
changes compared to their single-node PyLops code.
3833

3934
# Statement of need
4035

41-
As scientific datasets grow and the demand for higher resolution increases, the need for distributed computing alongside
42-
matrix-free linear algebra becomes more critical. Nowadays, it is in fact common for the size of models and datasets to exceed
43-
the memory capacity of a single machine—making it difficult to perform computations efficiently and accurately at the same time.
44-
However, many linear operators used in scientific inverse problems can be usually decomposed in a series of computational blocks that, although being resource-intensive, can be effectively parallelized; this further emphasizes the necessity for a distributed computing approach to inverse problems.
36+
As scientific datasets grow and the demand for higher resolution increases, the use of distributed computing in matrix-free linear algebra becomes crucial.
37+
Models and datasets can in fact easily exceed the memory of a single machine—making it difficult to perform computations efficiently and accurately at the same time. Nevertheless, many linear operators
38+
in scientific inverse problems can be decomposed into a series of computational blocks that are well-suited for parallelization.
4539

4640
When addressing distributed inverse problems, we identify three distinct families of problems:
4741

48-
- **1. Fully distributed models and data**: both model and data are distributed across nodes, with minimal
49-
communication during the modeling process. Communication mainly occurs in the solver when dot
50-
products or regularization terms (i.e., Laplacian) are applied. In this scenario, each node can easily
51-
handle a portion of the model and data when applying the modelling operator and its adjoint.
42+
- **1. Fully distributed models and data**: Both model and data are split across nodes, with each node processing its own portion of the model and data. This leads to minimal
43+
communication, mainly when performing dot products in the solver or in the regularization terms.
5244

53-
- **2. Distributed data, model available on all nodes**: in this case, data is distributed across nodes while the model is
54-
available on all nodes. Communication is required during the adjoint pass when models produced by each node need
55-
to be summed together, and in the solver when performing dot products on the data vector.
45+
- **2. Distributed data, model available on all nodes**: Data is distributed across nodes, whilst the model is available on all nodes.
46+
Communication happens during the adjoint pass to sum models and in the solver for data vector operations.
5647

57-
- **3. Model and data available on all nodes**: here, communication is confined to the operator, with nodes possessing the same copies
58-
of data and model master. All nodes then perform certain computations in the forward and adjoint pass of the operator and no
59-
communication is required in the solver.
48+
- **3. Model and data available on all nodes**: All nodes have identical copies of the data and model. Communication only happens within
49+
the operator, with no communication in solver needed.
6050

61-
MPI for Python (also known as mpi4py [@Dalcin:2021] provides Python bindings for the MPI standard, and allows Python applications to exploit the power of
62-
multiple processors on workstations, clusters and supercomputers. Recent updates to mpi4py (version 3.0 and above) have simplified its usage, enabling more efficient data communication between nodes and processes. Some projects in the Python ecosystem, such as
63-
mpi4py-fft [@Mortensen:2019], mcdc [@Morgan:2024], and mpi4jax [@mpi4jax], utilize mpi4py to extend their capabilities to distributed computing
64-
ultimately improving the efficiency and scalability of the exposed operations.
65-
66-
PyLops-MPI is built on top of PyLops [@Ravasi:2020] and utilizes mpi4py to enable the solution of
67-
large scale problems in a distributed and parallelized manner. PyLops-MPI offers an intuitive API that allows users to
68-
easily scatter and broadcast data and models across different nodes or processors, enabling to perform various mathematical operations on them (e.g., summmation, subtraction, norms) in a distributed manner. Moreover, it provides a suite of MPI-powered linear operators and linear solver and is designed in a flexible way, allowing any user to easily add custom operators and solvers tailored to their specific needs.
69-
70-
In summary, what sets PyLops-MPI apart from other libraries is its ease of use in creating MPI Operators, facilitating efficient
71-
integration between mpi4py and PyLops. This enables users to solve large-scale, complex inverse problems without the
72-
risk of data leaks or the need to manage MPI requirements themselves.
51+
MPI for Python (mpi4py [@Dalcin:2021]) provides Python bindings for the MPI standard, allowing applications to leverage multiple
52+
processors. Projects like mpi4py-fft [@Mortensen:2019], mcdc [@Morgan:2024], and mpi4jax [@mpi4jax]
53+
utilize mpi4py to provide distributed computing capabilities. Similarly, PyLops-MPI, which is built on top of PyLops [@Ravasi:2020] leverages mpi4py to solve large-scale problems in a distributed fashion.
54+
Its intuitive API provide functionalities to scatter and broadcast data and model vector across nodes and allows various mathematical operations (e.g., summation, subtraction, norms)
55+
to be performed. Additionally, a suite of MPI-powered linear operators and solvers is offered, and its flexible design eases the integration of custom operators and solvers.
56+
PyLops-MPI enables users to solve complex inverse problems without concerns about data leaks or MPI management.
7357

7458
# Software Framework
7559

76-
PyLops-MPI introduces MPI support to PyLops by providing an efficient API for solving linear problems through
77-
parallelization using the mpi4py library. This library is designed to tackle large-scale linear inverse problems that
78-
are difficult to solve using a single process (due to either extremely high computational cost of memory requirements).
60+
PyLops-MPI is designed to tackle large-scale linear inverse problems that are difficult to solve using a single process
61+
(due to either extremely high computational cost or memory requirements).
7962

8063
![Software Framework representation of the ``PyLops-MPI`` API.](figs/software_framework.png)
8164

82-
The main components of the library include:
65+
The main components of the library are:
8366

8467
## DistributedArray
8568

86-
The `pylops_mpi.DistributedArray` class serves as the fundamental array class used throughout the library. It enables
87-
the partitioning of large NumPy [@Harris:2020] or CuPy [@cupy] arrays into smaller local arrays, which can
88-
be distributed across different ranks. Additionally, it allows for broadcasting the NumPy or CuPy array to multiple processes.
89-
90-
The DistributedArray supports two types of partitions through the **partition** attribute: `Partition.SCATTER`
91-
distributes the data across all ranks, allowing users to specify how much load each rank should handle, while `Partition.BROADCAST`
92-
creates a copy of the data and distributes it to all ranks, ensuring that the data is replicated on each rank and kept consistent
93-
across the entire duration of a code.
94-
95-
Furthermore, various basic mathematical functions can be applied to one or more DistributedArray objects:
96-
97-
- Add (+) / Subtract (-): Adds or subtracts two DistributedArrays.
98-
- Multiply (*): Multiplies two DistributedArrays.
99-
- Dot-product (@): Calculates the dot product by flattening the arrays, resulting in a scalar value.
100-
- Conj: Computes the conjugate of the DistributedArray.
101-
- Norms: Calculates the vector norm along any specified axis.
102-
- Copy: Creates a deep copy of the DistributedArray.
103-
104-
PyLops-MPI also provides a way to stack a series of `pylops_mpi.DistributedArray` objects using the
105-
`pylops_mpi.StackedDistributedArray` class and perform mathematical operations on them.
106-
107-
## MPILinearOperator
108-
109-
`pylops_mpi.MPILinearOperator` is the base class for all MPI linear operators, allowing users to create new operators
110-
performing matrix-vector products on `pylops_mpi.DistributedArray` objects, which can be later coupled with any PyLops-MPI linear solver. To create a
111-
new MPILinearOperator, users need to subclass the `pylops_mpi.MPILinearOperator` parent class and specify the **shape** and **dtype**.
112-
The **_matvec** and **_rmatvec** methods should also be implemented for the forward and adjoint operations.
113-
114-
## MPIStackedLinearOperator
115-
116-
`pylops_mpi.MPIStackedLinearOperator` represents a second level of abstraction in the creation of MPI-powered linear operators; it
117-
allows users to create a stack of MPILinearOperator objects, where the different operators are invoked sequentially but each operator runs
118-
in a distributed fashion. The `pylops_mpi.MPIStackedLinearOperator` has the ability to perform matrix-vector products with
119-
both `pylops_mpi.DistributedArray` and `pylops_mpi.StackedDistributedArray`. Similar to `pylops_mpi.MPILinearOperator`,
120-
users need to subclass the `pylops_mpi.MPIStackedLinearOperator` parent class and specify the **shape** and **dtype**.
121-
The **_matvec** method should be implemented for the forward pass, while the **_rmatvec** method should be used for
122-
the adjoint pass.
69+
The `pylops_mpi.DistributedArray` class serves as the fundamental array class, enabling both partitioning and broadcasting of large
70+
NumPy [@Harris:2020] or CuPy [@cupy] arrays across multiple processes. It also supports basic math operations such as addition (+), multiplication (*), dot-product (@). Finally, multiple DistributedArray objects can be stacked using `pylops_mpi.StackedDistributedArray` for further operations.
12371

12472
## HStack, VStack, BlockDiag Operators
12573

126-
One of the main features of PyLops is the ability to create any combination of linear operators in a simple and expressive manner. Three key
127-
design patterns to combine linear operators are: i) horizontal stacking, ii) vertical stacking, iii) diagonal stacking. PyLops-MPI follows the
128-
same rationale and provides distributed versions of such operators. More specifically, `pylops_mpi.MPIBlockDiag` allows one to run multiple
129-
PyLops operators in parallel over different processes, each acting on a portion of the model and data (see family 1). On the other hand,
130-
`pylops_mpi.MPIVStack` allows one to run multiple PyLops operators in parallel, each acting on the entire model in forward mode;
131-
the adjoint of such operators is instead applied to different portions of the data vector and the individual outputs are then sum-reduced (see family 2).
132-
Finally, `pylops_mpi.MPIHStack` is simply the adjoint of `pylops_mpi.MPIVStack`.
74+
PyLops facilitates the combinations of multiple linear operators via horizontal, vertical, or diagonal stacking. PyLops-MPI provides
75+
distributed versions of these operations-e.g., `pylops_mpi.MPIBlockDiag` applies different operators in parallel on separate portions of the model
76+
and data. `pylops_mpi.MPIVStack` applies multiple operators in parallel to the whole model, with its adjoint applies the adjoint of each individual operator to portions of the data vector and sums the individual output. `pylops_mpi.MPIHStack` is the adjoint of MPIVStack.
13377

13478
## Halo Exchange
13579

136-
PyLops-MPI Linear Operators typically utilize halo exchange to facilitate interchange of portions of the model/data betweem consecutive ranks.
137-
Whilst users are encouraged toensure that the local data shapes at each rank are consistent to enable matrix-vector products without requiring
138-
external communication, in some cases this is not possible; instead, if the local shapes of the data and model at each rank do not match during these
139-
operations, the operator itself is tasked to perform a halo exchange, transferring boundary data cells (commonly referred to as "ghost
140-
cells") to/from neighboring processes. This ensures that the model vector and data vector shapes at each rank
141-
are correctly aligned for the operation. Consequently, this data transfer enables efficient local computations without
142-
the need for explicit inter-process communication, thereby avoiding heavy communication overhead.
80+
PyLops-MPI uses halo exchange to transfer portions of the model and data between ranks. Users should ensure consistent local data shapes to avoid extra communication during matrix-vector products. If shapes differ, the operator exchanges boundary data ("ghost cells") between neighboring processes, aligning shapes to enablr efficient local computations and minimize overhead.
14381

14482
## MPI-powered Solvers
14583

146-
PyLops-MPI offers a small subset of PyLops linear solvers, which can deal with **DistributedArray** and **StackedDistributedArray** objects. Internally, the solvers
147-
use the different mathematical operations implemented for such classes alongside calling the forward and adjoint passes of the operator itself. The
148-
currently available solvers can be found within the submodule `pylops_mpi.optimization`.
84+
PyLops-MPI offers a small subset of PyLops linear solvers, which can deal with **DistributedArray** and **StackedDistributedArray** objects. These solvers utilize
85+
the mathematical operations implemented in these classes and call the operator's forward and adjoint passes.
14986

15087
# Use Cases
15188

152-
In the following, we briefly discuss three different use cases in geophysical inverse problems that fall within the 3 different families of problems previously
153-
discussed. More specifically:
89+
Here we present three use cases in geophysical inverse problems that correspond to the previously mentioned families of problems:
90+
91+
- *Seismic Post-Stack Inversion* can be used to characterize the
92+
subsurface [@Ravasi:2021] from seismic data. In 3D applications, when both the model and data are three-dimensional arrays, PyLops-MPI distibutes one spatial axis across different ranks. Each rank therefore processes a subset of the entire data and model.
93+
Communication occurs due to the introduction of regularization terms that promote smooth or blocky solutions.
94+
95+
- *Least-Squares Migration (LSM)* explains seismic data via a Born modelling engine to produce high-resolution images of the subsurface
96+
reflectivity. [@Nemeth:1999]. PyLops-MPI distributes the available sources across different MPI ranks, and
97+
each rank applies the Born modeling operator for a subset of sources with the broadcasted reflectivity.
15498

155-
- *1. Seismic Post-Stack Inversion* represents an effective approach to quantitative characterization of the
156-
subsurface [@Ravasi:2021] from seismic data. In 3D applications, both the model and data
157-
are three-dimensional (2 spatial coordinates and depth/time). PyLops-MPI
158-
solves this problem by distributing one of the spatial axes across different ranks, allowing matrix-vector products and
159-
inversions to take place at each rank, which are later gathered to obtain the inverted model. Usually communication happens
160-
because of the introducing of regularization terms that ensure the solution to be smooth or blocky.
99+
- *Multi-Dimensional Deconvolution (MDD)* is a powerful technique used to estimate seismic datasets without overburden effects [@Ravasi:2022].
100+
PyLops-MPI tackles this large-scale inverse problem by distributing the kernel of the
101+
Multi-Dimensional Deconvolution operator across ranks, allowing each process to apply a batched matrix-vector multiplication to a portion of the input model and data.
161102

162-
- *2. Least-Squares Migration (LSM)* is the process of explaining seismic data via a Born modelling engine to produce an image of the subsurface
163-
reflectivity [@Nemeth:1999]. PyLops-MPI tackles this problem by distributing all of the available sources across different MPI ranks.
164-
Each rank applies the expensive Born modeling operator for a subset of sources with the broadcasted reflectivity.
165-
The resulting data is therefore scattered across the different ranks, and inversion is again perfromed using one of MPI-Powered solvers to produce the desired subsurface reflectivity image.
103+
As similar patterns are likely to emerge in inverse problems across other disciplines, we expect a broad adoption of the PyLops-MPI framework in other scientific fields.
166104

167-
- *3. Multi-Dimensional Deconvolution (MDD)* is a powerful technique used at various stages of the seismic processing
168-
value chain to create datasets deprived of overburden effects [@Ravasi:2022]. PyLops-MPI addresses this large-scale inverse problem by
169-
splitting the kernel of the so-called Multi-Dimensional Deconvolution (MDC) operator across ranks, such that each process can perform a portion of the
170-
batched matrix-vector (or matrix-matrix) multiplication required by such an operator. Here, both the model and data are available on all ranks for the entire inverse process.
105+
# Acknowledgements
171106

172-
Finally, we anticipate that similar patterns can be found in many other inverse problems in different disciplines and therefore we foresee a wide adoption of
173-
the PyLops-MPI frameworks in other scientific fields.
107+
The PyLops team acknowledges the support from Google Summer of Code and the NumFOCUS organization, which have been vital to the development of PyLops-MPI.
174108

175109
# References

0 commit comments

Comments
 (0)