Skip to content

Conversation

@kenneth59715
Copy link

modifications to parallel_backends.py and mpi_child.py allows running as native MPI.
Will still run as before, if mpi_cmd is not ALREADYMPI.

@asoplata asoplata changed the title added option to run as native MPI [WIP] added option to run as native MPI Feb 20, 2025
@asoplata
Copy link
Collaborator

Thank you for your contribution Kenneth!

Please note that we are planning on merging a PR that will make changes to some of the same code lines that you have made changes to (#871). We will try to get these merged onto master tomorrow.

After we have merged those changes, would you please mind rebasing your commit on the latest version of the master branch? Alternatively, if would like, I can try rebasing your branch myself. Either way, this will make it significantly easier for us to review your code changes.

I will note that we take changes to the MPI API seriously, so this PR may end up taking a good bit of time and effort, particularly for testing.

Finally, can I ask, what OS are you using? Are you primarily planning on using this from HPC environments?

Thanks,
Austin

@asoplata
Copy link
Collaborator

Hey Kenneth, we have merged a different PR's changes to the same files that you have made changes to. Please rebase your PR onto the current master branch. Alternatively, if you would like me to do it, I can do the rebasing; just let me know.

@kenneth59715
Copy link
Author

kenneth59715 commented Feb 20, 2025 via email

@asoplata asoplata force-pushed the nativempi branch 2 times, most recently from 4fa1d5b to 38e8ea5 Compare February 28, 2025 16:35
Co-authored-by: Austin E. Soplata <me@asoplata.com>
@asoplata
Copy link
Collaborator

Apologies for the multiple force-pushes: the first one had all the code changes, but I realized it also changed the authorship, so I had to push a few more times to make it clear that you were the primary author of the commits.

I have rebased and rewritten the history of your nativempi branch. If you have any uncommitted or unpushed work, please make a backup before you pull from your origin remote! Once you've done that, if you pull then everyone should be on the same page.

This PR is going to require a lot of testing and a fair number of changes. The first order of business is: Are you satisfied with the rebase? I.e. Do the changes now reflect all the changes that you wanted to make?

@asoplata
Copy link
Collaborator

I will also ask: Can you please provide a script illustrating this use-case? Especially how to setup a "native MPI" "host" thing (I don't know the right word for it), such that the new code is utilized? I have never used MPI without using mpiexec, but we also need this knowledge for the documentation of how to use this functionality (this will not be merged without adding documentation explaining how to use it). Thank you

@asoplata asoplata self-assigned this Feb 28, 2025
@asoplata asoplata added enhancement New feature or request parallelization Parallel processing high difficulty labels Feb 28, 2025
@kenneth59715
Copy link
Author

kenneth59715 commented Feb 28, 2025 via email

@kenneth59715
Copy link
Author

kenneth59715 commented Feb 28, 2025 via email

@asoplata
Copy link
Collaborator

Austin, Sorry, I haven't used github collaboratively. Is the new version of the nativempi branch in your repo? Should I pull that into my fork? Kenneth

I will send you an email so we can separately discuss how to handle the git changes, I am happy to provide assistance with this.

Austin,
The benchmark code would be invoked with mpiexec and the --alreadympi flag:
mpiexec -np 26 nrniv -python -mpi -nobanner 'benchmark_hnn_core_runtime.py' '--alreadympi'
I think I attached a modified benchmark_hnn_core_runtime.py to one of the emails.
Kenneth

Okay, thanks, knowing how to properly call the script you gave via email is super helpful. I will try to test it on our HPC soon.

To be clear, part of the reason I was asking for an example use-case is that for us to add a feature like this, the changes need to include documentation for both how and why you would want to use this feature (in addition to tests of the code changes). At minimum, we would want additions to documentation for our code-documentation websites (i.e. https://jonescompneurolab.github.io/hnn-core/stable/index.html ) , such as on https://github.com/jonescompneurolab/hnn-core/blob/master/doc/parallel.md and example code at https://github.com/jonescompneurolab/hnn-core/tree/master/examples . I'm happy to help convert the use-cases you provide into documentation for how to use the feature.

However, I don't yet fully grasp the motivation behind these changes; I have some, but limited experience with MPI in HPC environments. Would you mind expounding on the benefits of this feature? For example, is it faster than using child mpiexec calls (I'm sure the benchmark script is intended for that question), does it make it easier/possible to run HNN across multiple nodes simultaneously, etc.? Any description of the benefits you provide will be very helpful, and we will want to incorporate it into our documentation itself.

@kenneth59715
Copy link
Author

kenneth59715 commented Feb 28, 2025 via email

@asoplata
Copy link
Collaborator

asoplata commented Mar 4, 2025

Hello Kenneth,

I spoke with my HNN colleagues yesterday, and in addition to the motivations you explained, they also informed me that the current way we spawn child processes may actually be disallowed in some HPC environments due to security policy. Indeed, this is part of the justification for an older, un-merged PR by @rythorpe here: #506 (we may want to merge at least some of the code from that PR into this one). Enabling parallel HNN-core to run on a wider range of HPC environments, and become easier to use in custom MPI environments, is certainly enough motivation to work on merging this PR.

To be frank, I'm swamped right now and don't have enough attention-bandwidth to give this issue the focus it deserves -- however this will change in a month after we finish preparing for and do our online HNN Workshop in early April. I should be able to dive into this after that.

A very incomplete list of the work left to do for this PR includes:

  • At minimum, any code changes need to pass a local run of make test on an installed development version of hnn-core. See our Contributing Page for how to set that up, and this section for running the tests.
    • Currently, the existing pytest tests pass, but this PR fails our ruff checks for style. Once you have installed the development version and packages for hnn-core, you can run ruff check hnn_core from the repository's top-level directory to see what errors are causing the problems. These need to be fixed, since they will cause our automated Github testing to fail as well.
  • The use of the term "alreadympi" should be replaced with something more descriptive. Similarly, we could use a good term for the non-alreadympi case, which will help us communicate about these two distinct code paths.
  • The commented-out uses of kill_proc_name('nrniv') should be re-enabled, BUT only in the non-alreadympi case. We still want to use that functionality in certain cases, or at least have a way to control it (maybe a no_kill_processes flag or something similar?).
  • All new functions like mpi_child.py::runit need to have comprehensive docstring documentation. The name should also be improved.
  • Tests! If we're going to support invoking hnn-core's MPI usage this way, then there need to be tests created inside our hnn_core/tests directory that explicitly run from inside such an external MPI invocation. I'm not sure how to do this, but maybe the easiest way is to start an MPI "host-thing" that executes some Python where the Python calls pytest directly, like https://docs.pytest.org/en/6.2.x/usage.html#calling-pytest-from-python-code .
  • Documentation: as I said in an earlier comment, once the API for this PR is settled, we need to add guidance to our documentation on how to use this functionality, including why you would want to use one type of MPI invocation vs our pre-existing mpiexec use.

@kenneth59715
Copy link
Author

kenneth59715 commented Mar 5, 2025 via email

@asoplata
Copy link
Collaborator

asoplata commented Mar 7, 2025

Congratulations on the retirement Kenneth!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request high difficulty parallelization Parallel processing

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants