-
Notifications
You must be signed in to change notification settings - Fork 908
Description
Thank you for taking the time to submit an issue!
Background information
MPI_Comm_split_type is creating groups with overlapping ranks in certain cases where ranks are bound to cores across resource domains (say L3). For example, consider the following where ranks 2 and 5 share two L3 domains.
(program using MPI_Comm_split_type(MPI_COMM_WORLD, OMPI_COMM_TYPE_L3CACHE, 0, info, &newcomm); on a Genoa machine with 8 cores per L3)
mpirun -np 8 --map-by ppr:8:numa:pe=3 --report-bindings ./a.out
[electra016:1374129] Rank 0 bound to package[0][core:0-2]
[electra016:1374129] Rank 1 bound to package[0][core:3-5]
[electra016:1374129] Rank 2 bound to package[0][core:6-8]
[electra016:1374129] Rank 3 bound to package[0][core:9-11]
[electra016:1374129] Rank 4 bound to package[0][core:12-14]
[electra016:1374129] Rank 5 bound to package[0][core:15-17]
[electra016:1374129] Rank 6 bound to package[0][core:18-20]
[electra016:1374129] Rank 7 bound to package[0][core:21-23]
Hello --- my rank: 0, my comm_size: 8
Hello --- my rank: 1, my comm_size: 8
Hello --- my rank: 7, my comm_size: 8
Hello --- my rank: 6, my comm_size: 8
Hello --- my rank: 5, my comm_size: 8
Hello --- my rank: 4, my comm_size: 8
Hello --- my rank: 3, my comm_size: 8
Hello --- my rank: 2, my comm_size: 8
From split comm: my rank: 0, my split_comm_size: 3
From split comm: my rank: 2, my split_comm_size: 6
From split comm: my rank: 4, my split_comm_size: 4
From split comm: my rank: 6, my split_comm_size: 3
From split comm: my rank: 1, my split_comm_size: 3
From split comm: my rank: 3, my split_comm_size: 4
From split comm: my rank: 5, my split_comm_size: 6
From split comm: my rank: 7, my split_comm_size: 3
As we can see from the above, there are only two ranks with comm_size 6! Although it doesn't print out the ranks within each communicator, here's what it would be:
comm(0): 0, 1, 2
comm(1): 0, 1, 2
comm(2): 0, 1, 2, 3, 4, 5
comm(3): 2, 3, 4, 5
comm(4): 2, 3, 4, 5
comm(5): 2, 3, 4, 5, 6, 7
comm(6): 5, 6, 7
comm(7): 5, 6, 7
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
I tested with 5.0.x and 4.1.6
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
From source (5.0.x)
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
Please describe the system on which you are running
- Operating system/version:
- Computer hardware:
- Network type:
Details of the problem
Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.
Details in the background section. Here is an example program:
#include <stdlib.h>
#include <stdio.h>
#include "mpi.h"
int main (int argc, char **argv) {
MPI_Init(&argc, &argv);
int rank, size, comm_size, newcomm_size;
int status = 0;
MPI_Comm newcomm;
MPI_Info info;
// Get the number of MPI processes:
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("Hello --- my rank: %d, my comm_size: %d\n", rank, size);
MPI_Info_create(&info);
status = MPI_Comm_split_type(MPI_COMM_WORLD, OMPI_COMM_TYPE_L3CACHE, 0, info, &newcomm);
if (status) {
printf("Error in comm split %d\n", status);
}
MPI_Comm_size(newcomm, &newcomm_size);
printf("From split comm: my rank: %d, my split_comm_size: %d\n", rank, newcomm_size);
MPI_Finalize();
return status;
}
====================
Note: If you include verbatim output (or a code block), please use a GitHub Markdown code block like below:
shell$ mpirun -np 8 --map-by ppr:8:numa:pe=3 --report-bindings ./a.out (on a Genoa machine with 3CCDs per numa and 8 cores per CCD)