Skip to content

MPI_Comm_split_type not creating disjoint subgroups for certain cases #12812

@mshanthagit

Description

@mshanthagit

Thank you for taking the time to submit an issue!

Background information

MPI_Comm_split_type is creating groups with overlapping ranks in certain cases where ranks are bound to cores across resource domains (say L3). For example, consider the following where ranks 2 and 5 share two L3 domains.
(program using MPI_Comm_split_type(MPI_COMM_WORLD, OMPI_COMM_TYPE_L3CACHE, 0, info, &newcomm); on a Genoa machine with 8 cores per L3)

mpirun -np 8 --map-by ppr:8:numa:pe=3 --report-bindings ./a.out
[electra016:1374129] Rank 0 bound to package[0][core:0-2]
[electra016:1374129] Rank 1 bound to package[0][core:3-5]
[electra016:1374129] Rank 2 bound to package[0][core:6-8]
[electra016:1374129] Rank 3 bound to package[0][core:9-11]
[electra016:1374129] Rank 4 bound to package[0][core:12-14]
[electra016:1374129] Rank 5 bound to package[0][core:15-17]
[electra016:1374129] Rank 6 bound to package[0][core:18-20]
[electra016:1374129] Rank 7 bound to package[0][core:21-23]
Hello --- my rank: 0, my comm_size: 8
Hello --- my rank: 1, my comm_size: 8
Hello --- my rank: 7, my comm_size: 8
Hello --- my rank: 6, my comm_size: 8
Hello --- my rank: 5, my comm_size: 8
Hello --- my rank: 4, my comm_size: 8
Hello --- my rank: 3, my comm_size: 8
Hello --- my rank: 2, my comm_size: 8
From split comm: my rank: 0, my split_comm_size: 3
From split comm: my rank: 2, my split_comm_size: 6
From split comm: my rank: 4, my split_comm_size: 4
From split comm: my rank: 6, my split_comm_size: 3
From split comm: my rank: 1, my split_comm_size: 3
From split comm: my rank: 3, my split_comm_size: 4
From split comm: my rank: 5, my split_comm_size: 6
From split comm: my rank: 7, my split_comm_size: 3

As we can see from the above, there are only two ranks with comm_size 6! Although it doesn't print out the ranks within each communicator, here's what it would be:

comm(0): 0, 1, 2
comm(1): 0, 1, 2
comm(2): 0, 1, 2, 3, 4, 5
comm(3): 2, 3, 4, 5
comm(4): 2, 3, 4, 5
comm(5): 2, 3, 4, 5, 6, 7
comm(6): 5, 6, 7
comm(7): 5, 6, 7

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

I tested with 5.0.x and 4.1.6

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From source (5.0.x)

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version:
  • Computer hardware:
  • Network type:

Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

Details in the background section. Here is an example program:

#include <stdlib.h>
#include <stdio.h>
#include "mpi.h"


int main (int argc, char **argv) {

   MPI_Init(&argc, &argv);
   
   int rank, size, comm_size, newcomm_size;
   int status = 0;
   
   MPI_Comm newcomm;
   MPI_Info info;
   
   // Get the number of MPI processes:
   MPI_Comm_size(MPI_COMM_WORLD, &size);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   
   printf("Hello --- my rank: %d, my comm_size: %d\n", rank, size);
   
    MPI_Info_create(&info);
   
   status = MPI_Comm_split_type(MPI_COMM_WORLD, OMPI_COMM_TYPE_L3CACHE, 0, info,  &newcomm);
   
   if (status) {
   	printf("Error in comm split %d\n", status);
   }
   
   MPI_Comm_size(newcomm, &newcomm_size);
   printf("From split comm: my rank: %d, my split_comm_size: %d\n", rank, newcomm_size);

   MPI_Finalize();

   return status;
} 

====================

Note: If you include verbatim output (or a code block), please use a GitHub Markdown code block like below:

shell$ mpirun -np 8 --map-by ppr:8:numa:pe=3 --report-bindings ./a.out    (on a Genoa machine with 3CCDs per numa and 8 cores per CCD)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions