|
3 | 3 | Benchmark Utility in PyLops-MPI |
4 | 4 | =============================== |
5 | 5 | PyLops-MPI users can convenienly benchmark the performance of their code with a simple decorator. |
6 | | - |
7 | | -This tutorial demonstrates how to use the :py:func:`pylops_mpi.utils.benchmark` and |
8 | | -:py:func:`pylops_mpi.utils.mark` utility methods in PyLops-MPI. These utilities support various |
| 6 | +:py:func:`pylops_mpi.utils.benchmark` and :py:func:`pylops_mpi.utils.mark` support various |
9 | 7 | function calling patterns that may arise when benchmarking distributed code. |
10 | 8 |
|
11 | 9 | - :py:func:`pylops_mpi.utils.benchmark` is a **decorator** used to time the execution of entire functions. |
12 | 10 | - :py:func:`pylops_mpi.utils.mark` is a **function** used inside decorated functions to insert fine-grained time measurements. |
13 | 11 |
|
14 | | -Basic Setup |
15 | | ------------ |
16 | | - |
17 | | -We start by importing the required modules and setting up some parameters of our simple program. |
18 | | - |
19 | | -.. code-block:: python |
20 | | -
|
21 | | - import sys |
22 | | - import logging |
23 | | - import numpy as np |
24 | | - from mpi4py import MPI |
25 | | - from pylops_mpi import DistributedArray, Partition |
26 | | -
|
27 | | - from pylops_mpi.utils.benchmark import benchmark, mark |
| 12 | +.. note:: |
| 13 | + This benchmark utility is enabled by default i.e., if the user decorates the function with :py:func:`@benchmark`, the function will go through |
| 14 | + the time measurements, adding overheads. Users can turn off the benchmark while leaving the decorator in-place with |
28 | 15 |
|
29 | | - np.random.seed(42) |
30 | | - rank = MPI.COMM_WORLD.Get_rank() |
| 16 | + .. code-block:: bash |
31 | 17 |
|
32 | | - par = {'global_shape': (500, 501), |
33 | | - 'partition': Partition.SCATTER, 'dtype': np.float64, |
34 | | - 'axis': 1} |
| 18 | + >> export BENCH_PYLOPS_MPI=0 |
35 | 19 |
|
36 | | -Benchmarking a Simple Function |
37 | | ------------------------------- |
38 | | - |
39 | | -We define a simple function and decorate it with :py:func:`benchmark`. |
| 20 | +The usage can be as simple as: |
40 | 21 |
|
41 | 22 | .. code-block:: python |
42 | 23 |
|
43 | 24 | @benchmark |
44 | | - def inner_func(par): |
45 | | - dist_arr = DistributedArray(global_shape=par['global_shape'], |
46 | | - partition=par['partition'], |
47 | | - dtype=par['dtype'], axis=par['axis']) |
48 | | - # may perform computation here |
49 | | - dist_arr.dot(dist_arr) |
50 | | -
|
51 | | -Calling the function will result in the elapsed runtime being printed to standard output. |
52 | | - |
53 | | -.. code-block:: python |
54 | | -
|
55 | | - inner_func(par) |
56 | | -
|
57 | | -You can also customize the label of the printout using the ``description`` parameter: |
| 25 | + def function_to_time(): |
| 26 | + # Your computation |
58 | 27 |
|
59 | | -.. code-block:: python |
60 | | -
|
61 | | - @benchmark(description="printout_name") |
62 | | - def my_func(...): |
63 | | - ... |
64 | | -
|
65 | | -Fine-grained Time Measurements |
66 | | ------------------------------- |
67 | | - |
68 | | -To gain more insight into the runtime of specific code regions, use :py:func:`mark` within |
69 | | -a decorated function. This allows insertion of labeled time checkpoints. |
| 28 | +The result will print out to the standard output. |
| 29 | +For fine-grained time measurements, :py:func:`pylops_mpi.utils.mark` can be inserted in the code region of benchmarked functions: |
70 | 30 |
|
71 | 31 | .. code-block:: python |
72 | 32 |
|
73 | 33 | @benchmark |
74 | | - def inner_func_with_mark(par): |
75 | | - mark("Begin array constructor") |
76 | | - dist_arr = DistributedArray(global_shape=par['global_shape'], |
77 | | - partition=par['partition'], |
78 | | - dtype=par['dtype'], axis=par['axis']) |
79 | | - mark("Begin dot") |
80 | | - dist_arr.dot(dist_arr) |
81 | | - mark("Finish dot") |
82 | | -
|
83 | | -The output will now contain timestamped entries for each marked location, along with the total time |
84 | | -from the outer decorator (marked with ``[decorator]`` in the output). |
85 | | - |
86 | | -.. code-block:: python |
87 | | -
|
88 | | - inner_func_with_mark(par) |
89 | | -
|
90 | | -Nested Function Benchmarking |
91 | | ----------------------------- |
92 | | - |
93 | | -You can nest benchmarked functions to track execution times across layers of function calls. |
94 | | -Below, we define an :py:func:`outerfunc_with_mark` that calls :py:func:`inner_func_with_mark` defined earlier. |
95 | | - |
96 | | -.. code-block:: python |
97 | | -
|
98 | | - @benchmark |
99 | | - def outer_func_with_mark(par): |
100 | | - mark("Outer func start") |
101 | | - inner_func_with_mark(par) |
102 | | - dist_arr = DistributedArray(global_shape=par['global_shape'], |
103 | | - partition=par['partition'], |
104 | | - dtype=par['dtype'], axis=par['axis']) |
105 | | - dist_arr + dist_arr |
106 | | - mark("Outer func ends") |
107 | | -
|
108 | | -Calling the function prints the full call tree with indentation, capturing both outer and nested timing. |
109 | | - |
110 | | -.. code-block:: python |
111 | | -
|
112 | | - outer_func_with_mark(par) |
113 | | -
|
114 | | -Logging Benchmark Output |
115 | | ------------------------- |
116 | | - |
117 | | -To store benchmarking results in a file, pass a custom :py:class:`logging.Logger` instance |
118 | | -to the :py:func:`benchmark` decorator. Below is a utility function that constructs such a logger. |
119 | | - |
120 | | -.. code-block:: python |
121 | | -
|
122 | | - def make_logger(save_file=False, file_path=''): |
123 | | - logger = logging.getLogger(__name__) |
124 | | - logging.basicConfig(filename=file_path if save_file else None, |
125 | | - filemode='w', level=logging.INFO, force=True) |
126 | | - logger.propagate = False |
127 | | - if save_file: |
128 | | - handler = logging.FileHandler(file_path, mode='w') |
129 | | - else: |
130 | | - handler = logging.StreamHandler(sys.stdout) |
131 | | - logger.addHandler(handler) |
132 | | - return logger |
133 | | -
|
134 | | -Use this logger when decorating your function: |
135 | | - |
136 | | -.. code-block:: python |
137 | | -
|
138 | | - save_file = True |
139 | | - file_path = "benchmark.log" |
140 | | - logger = make_logger(save_file, file_path) |
141 | | -
|
142 | | - @benchmark(logger=logger) |
143 | | - def inner_func_with_logger(par): |
144 | | - dist_arr = DistributedArray(global_shape=par['global_shape'], |
145 | | - partition=par['partition'], |
146 | | - dtype=par['dtype'], axis=par['axis']) |
147 | | - # may perform computation here |
148 | | - dist_arr.dot(dist_arr) |
149 | | -
|
150 | | -Run the function to generate output written directly to ``benchmark.log``. |
151 | | - |
152 | | -.. code-block:: python |
153 | | -
|
154 | | - inner_func_with_logger(par) |
155 | | -
|
156 | | -Final Notes |
157 | | ------------ |
158 | | - |
159 | | -This tutorial demonstrated how to benchmark distributed PyLops-MPI operations using both |
160 | | -coarse and fine-grained instrumentation tools. These utilities help track and debug |
161 | | -performance bottlenecks in parallel workloads. |
162 | | - |
| 34 | + def funtion_to_time(): |
| 35 | + # You computation that you may want to ignore it in benchmark |
| 36 | + mark("Begin Region") |
| 37 | + # You computation |
| 38 | + mark("Finish Region") |
| 39 | +
|
| 40 | +You can also nest benchmarked functions to track execution times across layers of function calls with the output being correctly formatted. |
| 41 | +Additionally, the result can also be exported to the text file. For completed and runnable examples, visit :ref:`sphx_glr_tutorials_benchmarking.py` |
0 commit comments