Skip to content

MPI_Allgatherv in mpi_allgather function#186

Open
rohanbabbar04 wants to merge 9 commits intomainfrom
allgatherv
Open

MPI_Allgatherv in mpi_allgather function#186
rohanbabbar04 wants to merge 9 commits intomainfrom
allgatherv

Conversation

@rohanbabbar04
Copy link
Collaborator

Closes #169

  • Use MPI_AllGatherv in mpi_allgather.

@rohanbabbar04
Copy link
Collaborator Author

I made a change which was bugging me when I was implementing MPI_Allgatherv, since it works for contiguous arrays only.

In Fredholm, y1 after transpose() becomes a non-contiguous array.

y1 = (
                    ncp.matmul(x.transpose(0, 2, 1).conj(), self.G)
                    .transpose(0, 2, 1)
                    .conj()
                )

This can be simplified to y1 = ncp.matmul(self.G.transpose(0, 2, 1).conj(), x) which is contiguous as well.

@mrava87
Copy link
Contributor

mrava87 commented Feb 17, 2026

Thanks @rohanbabbar04

When I suggested we should investigate Allgatherv I did not foresee the issue with non-contiguous arrays.

What you propose in the Fredholm code is a bit problematic though. We have used that transpose patterns for a real reason, G is usually much bigger than x, so by transposing twice x (before and after G is applied) we do effectively much fewer operations than transposing G... we do the same in PyLops' original operator, so changing this would require some proper benchmarking to do if the combo of changes you suggest is actually beneficial 😉

@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Feb 18, 2026

Ah!, Yes in that case we should keep it as before, because when G>>x then G transpose and conj will put pressure on memory.

I also think we should keep both Allgather and Allgatherv, whenever the shapes are variable we use Allgatherv otherwise the preferred method should be Allgather.

Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohanbabbar04 this looks good!

Just one thing i am a bit unsure.... you mentioned about the issue with Fredholm as the array is non-contiguous and that allgatherv requires contiguity.. but we don't seem to check this in mpi_allgather... so what would happen if we have Fredholm with G that have different size in G.shape[0]... I suspect the code will crash?

You could probably test it first by doing something like

_F = np.arange(par["nsl"] * par["nx"] * par["ny"]).reshape(

nsl =par["nsl"] +1 if rank == 0 else par["nsl"]

and then use nsl instead of par["nsl"] in the rest of the test... this ensures rank0 has different size than the other ranks and we should hit to allgatherv with non-contiguous arrays...

If I am right fix the code by:

  • either add a check of contiguity in mpi_allgather
  • adding a parameter to mpi_allgather and methods that call it to let user force allgather even if the different arrays don't have the same shape with the usual padding/unrolling

@rohanbabbar04
Copy link
Collaborator Author

@rohanbabbar04 this looks good!

Just one thing i am a bit unsure.... you mentioned about the issue with Fredholm as the array is non-contiguous and that allgatherv requires contiguity.. but we don't seem to check this in mpi_allgather... so what would happen if we have Fredholm with G that have different size in G.shape[0]... I suspect the code will crash?

You could probably test it first by doing something like

_F = np.arange(par["nsl"] * par["nx"] * par["ny"]).reshape(

nsl =par["nsl"] +1 if rank == 0 else par["nsl"]

and then use nsl instead of par["nsl"] in the rest of the test... this ensures rank0 has different size than the other ranks and we should hit to allgatherv with non-contiguous arrays...

If I am right fix the code by:

* either add a check of contiguity in `mpi_allgather`

* adding a parameter to `mpi_allgather` and methods that call it to let user force `allgather` even if the different arrays don't have the same shape with the usual padding/unrolling

Yup, what I did was use send_buf.copy() in the allgather which would resolve this issue, but we should change it to ncp.ascontiguousarray(send_buf) so that it only changes when it is non-contiguous.

@rohanbabbar04 rohanbabbar04 requested a review from mrava87 March 13, 2026 19:31
@mrava87
Copy link
Contributor

mrava87 commented Mar 13, 2026

@rohanbabbar04 this looks good!
Just one thing i am a bit unsure.... you mentioned about the issue with Fredholm as the array is non-contiguous and that allgatherv requires contiguity.. but we don't seem to check this in mpi_allgather... so what would happen if we have Fredholm with G that have different size in G.shape[0]... I suspect the code will crash?
You could probably test it first by doing something like

_F = np.arange(par["nsl"] * par["nx"] * par["ny"]).reshape(

nsl =par["nsl"] +1 if rank == 0 else par["nsl"]

and then use nsl instead of par["nsl"] in the rest of the test... this ensures rank0 has different size than the other ranks and we should hit to allgatherv with non-contiguous arrays...
If I am right fix the code by:

* either add a check of contiguity in `mpi_allgather`

* adding a parameter to `mpi_allgather` and methods that call it to let user force `allgather` even if the different arrays don't have the same shape with the usual padding/unrolling

Yup, what I did was use send_buf.copy() in the allgather which would resolve this issue, but we should change it to ncp.ascontiguousarray(send_buf) so that it only changes when it is non-contiguous.

Alright! I see this for AllGatherv but what about AllGather, is it also needed? In our current codebase, the Fredholm implementation is likely to create a non-contiguous array as you stated in one of your comments, right? But we did not do any copy/ascontiguousarray, was it because _prepare_allgather_inputs was effectively doing it?

@rohanbabbar04
Copy link
Collaborator Author

@rohanbabbar04 this looks good!
Just one thing i am a bit unsure.... you mentioned about the issue with Fredholm as the array is non-contiguous and that allgatherv requires contiguity.. but we don't seem to check this in mpi_allgather... so what would happen if we have Fredholm with G that have different size in G.shape[0]... I suspect the code will crash?
You could probably test it first by doing something like

_F = np.arange(par["nsl"] * par["nx"] * par["ny"]).reshape(

nsl =par["nsl"] +1 if rank == 0 else par["nsl"]

and then use nsl instead of par["nsl"] in the rest of the test... this ensures rank0 has different size than the other ranks and we should hit to allgatherv with non-contiguous arrays...
If I am right fix the code by:

* either add a check of contiguity in `mpi_allgather`

* adding a parameter to `mpi_allgather` and methods that call it to let user force `allgather` even if the different arrays don't have the same shape with the usual padding/unrolling

Yup, what I did was use send_buf.copy() in the allgather which would resolve this issue, but we should change it to ncp.ascontiguousarray(send_buf) so that it only changes when it is non-contiguous.

Alright! I see this for AllGatherv but what about AllGather, is it also needed? In our current codebase, the Fredholm implementation is likely to create a non-contiguous array as you stated in one of your comments, right? But we did not do any copy/ascontiguousarray, was it because _prepare_allgather_inputs was effectively doing it?

Yes, you are right it was already handled by the _prepare_allgather_inputs function.

@mrava87
Copy link
Contributor

mrava87 commented Mar 15, 2026

@rohanbabbar04 Alright, then I think we agree. Resolve conflicts and get the CI back up and running and then we should be almost ready to merge - I did also add a few minor extra comments below...

@mrava87
Copy link
Contributor

mrava87 commented Mar 15, 2026

@tharittk can you also please have a look at this PR since it changes a few things you did, would like to have your input 😄

@mrava87 mrava87 requested a review from tharittk March 15, 2026 19:47
Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohanbabbar04 left a few more comments and waiting for @tharittk input 😄 almost there!

Copy link
Collaborator

@tharittk tharittk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @rohanbabbar04 !

I have somewhat similar feeling to @mrava87 that I much prefer to keep the array unrolling in one place. Currently, your propose mpi_allgather has two unrolling (for the variable-size and non-variable size). And previously we have the _unroll_allgather_recv which half of it becomes obsolete (the mpi part).

Hmm, I think it would be really nice to have them in one place... Maybe extending the original _unroll_allgather_recv to handle both padded and non-padded input ?

@rohanbabbar04
Copy link
Collaborator Author

Ok, did some minor changes to _unroll_gather_recv to handle the new logic.
Also, introduced a new method for preparing the gathered data for MPI before it goes in Allgather/Allgatherv.

@rohanbabbar04 rohanbabbar04 requested a review from tharittk March 19, 2026 15:14
@rohanbabbar04
Copy link
Collaborator Author

@mrava87, I think once this is merged we can make a new release as the issue with numpy>=2.4 has also not been released.

@mrava87
Copy link
Contributor

mrava87 commented Mar 20, 2026

@rohanbabbar04 agreed! Let me have a final look at this at the weekend (and check with @hongyx11 why the CuPy CI seems to have stopped working)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use in AllGatherv in mpi_allgather

3 participants