Skip to content

Conversation

@tharittk
Copy link
Collaborator

@tharittk tharittk commented Jul 11, 2025

This PR introduces a benchmark decorator that enables users to place mark(label) calls anywhere inside a function to record execution time at specific code points. The timing marks are collected and (as of now) printed in a human-readable format after the function completes.

The design is inspired by C++-style instrumentation, where developers can inject performance markers directly into the code.

Changes made:

  • benchmark.py A benchmark decorator for lightweight instrumentation

  • [Removed] kirchhoff_bench.py A sample script to demonstrate how to use the decorator -- this belongs to the other repo and thus removed

Ongoing work:

  • New/Updated documentation to illustrate how to use benchmark

  • formatted print and logging

  • Take a look more closely on CuPy / NCCL Synchronization issue as pointed in the comment

@mrava87
Copy link
Contributor

mrava87 commented Jul 14, 2025

@tharittk I’ll take a look at this soon. Regarding the 2 points of ongoing work, I think these kind of extensive benchmark tests do not belong to the main repo but they are better suited for the one we created to put test codes… here we should add some minimal tests to verify the correctness of the benchmark decorator and mark functionality

Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tharittk good start! I think we will probably need a few iterations here to get something that is functional and looks nice, and I am playing myself with it to get a better feeling 😄

Here are some initial comments as they come:

  • definitely we should not have benchmark scripts in the main repo, so Kirchhoff_bench has to be ultimately removed from here and put in separate repo
  • we should try to add a new page to the documentation called Benchmarking where we can provide an explanation of this decorator, how to use it, and show a small example (a bit like we have in the GPU Support and Adding New Operators pages)
  • I tried to add a benchmark decorator to nccl_allreduce but what I see is that the messages of benchmark come all before those of mark.. is this expected/intended?
  • In general I am never in favour of re-inventing the wheel.. so i have been wondering if we really need to write code to print/write on file and after a few tests I think going for the route of using logging is probably better for the long run (we can get lots of things for free including writing to file. A little example:

In kirchhoff_bench:

import logging
logging.basicConfig(level=logging.INFO, force=True) or logging.basicConfig(filename='kirch_log.log', level=logging.INFO, force=True)
numba_logger = logging.getLogger('numba') 
numba_logger.setLevel(logging.WARNING) # silence numba (can do any other library we get logs one doesn't want to see)

and in benchmark

import logging
logger = logging.getLogger(__name__)

... 
print("".join(output)) -> logging.info("".join(output))

Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tharittk well done, I like the new formatting with messages from nested benchmark appearing in the right order 😄

I left a few additional comments/thoughts for you... once you have addressed the remaining points that you have in the PR main comment, I'll do a final review

Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @tharittk ! I just left a few minor comments, but I think we have converged to a final solution 😄

Try to handle the two remaining tasks and then we should be ready to merge this!

Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tharittk very good, I like the tutorial 😄 I just left some minor comments related to typos and unclear text...

I just noticed though that you have not yet handled this

# TODO (tharitt): later move to env file or something
ENABLE_BENCHMARK = True

which is the reason why I was asking to write a page like https://github.com/tharittk/pylops-mpi/blob/benchmark_instrument/docs/source/adding.rst more than a tutorial... note that I would maybe call it BENCH_PYLOPS_MPI (always good to have the name of the library so the env variable scope is not ambiguous

I think it is fine to have the tutorial but we definitely need a page where we can replicate a bit the same info and more importantly explain how one can turn on/off benchmark; this however brings me to my last more general question: as we progressed I kind of lost a bit track of the original goal of this (lol) which is timing our communication methods in _nccl; at this point we have not added any decorator to them so how do you envision doing it - a developer goes it manually, decorates them when doing some benchmarking and then takes away the decorators? I guess that is a doable approach, although a bit clunky... the other way could be to decorate all of them since we have the ENABLE_BENCHMARK /BENCH_PYLOPS_MPI variable that is generally set to 0 (False) and that the method benchmark checks to decide whether to activate or not... as the way we ended up creating benchmark/mark would always log the time if we add it and print it, and that is not what we want in a normal run.

Any thoughts?

@tharittk
Copy link
Collaborator Author

tharittk commented Jul 27, 2025

@tharittk very good, I like the tutorial 😄 I just left some minor comments related to typos and unclear text...

I just noticed though that you have not yet handled this

# TODO (tharitt): later move to env file or something
ENABLE_BENCHMARK = True

which is the reason why I was asking to write a page like https://github.com/tharittk/pylops-mpi/blob/benchmark_instrument/docs/source/adding.rst more than a tutorial... note that I would maybe call it BENCH_PYLOPS_MPI (always good to have the name of the library so the env variable scope is not ambiguous

I think it is fine to have the tutorial but we definitely need a page where we can replicate a bit the same info and more importantly explain how one can turn on/off benchmark; this however brings me to my last more general question: as we progressed I kind of lost a bit track of the original goal of this (lol) which is timing our communication methods in _nccl; at this point we have not added any decorator to them so how do you envision doing it - a developer goes it manually, decorates them when doing some benchmarking and then takes away the decorators? I guess that is a doable approach, although a bit clunky... the other way could be to decorate all of them since we have the ENABLE_BENCHMARK /BENCH_PYLOPS_MPI variable that is generally set to 0 (False) and that the method benchmark checks to decide whether to activate or not... as the way we ended up creating benchmark/mark would always log the time if we add it and print it, and that is not what we want in a normal run.

Any thoughts?

I prefer not to pre-decorate the function in the package for these reasons:

  • If the user intends to benchmark only their customized functions, the fact that we pre-decorated these "internal" functions may lead to a cluttered output.
  • At the current implementation of PyLops-MPI, I think most of the decoration will be either in DistributedArray, like those _all_reduce, _allgather, and the operators like MPIVStack (_matvec and _rmatvec). As you said, it is a bit clunky.
  • Also, I think the approach above is better than decorating the NCCL call in _nccl.py because now we can see benchmark results when we run with MPI also (for our benchmarking purpose of comparing MPI + NCCL, this is more convenient and gives the same end-to-end timing). In fact, the current code will not work with decorating the _nccl.py because it will create a circular dependency (the _nccl_sync() is imported by benchmark.py and now we also import benchmark.py to _nccl.py).

So I would say I think we can keep the environment variable BENCH_PYLOPS_MPI but set it default to 1. This will give some convenience so that users don't have to go back and remove their decorations and can just set this variable to 0 instead when they don't want the benchmarking.

Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tharittk I totally agree with your judgement, let's go this way... and add the BENCH_PYLOPS_MPI variable that when set to 0 will skip benchmark... as you say, if unset default to 1. However, you do need to modify benchmark.py to handle the new env variable, which does not seem to be done yet.

But I am not super happy with the benchmarking.rst file yet. I left a detailed comment that hopefully can help you improve it and make it a bit more different f 😄

@mrava87 mrava87 merged commit 7f0b01e into PyLops:main Jul 29, 2025
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants