Memory leak with CUDA-aware OpenMPI without UCX

Hello!

### What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
4.1.7a1

### Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Taken from NVIDIA's HPC SDK (more details in the logs)

### Please describe the system on which you are running
* Operating system/version:
Linux, Ubuntu 22.04
* Computer hardware:
 + 2 A100_40GB_PCIE
* Network type:
Not sure (any tips how to extract this information?)

## Details of the problem
A simple reproducer calls in a loop cudaMalloc - MPI_Bcast - cudaFree and the device memory is checked via cudaGetMemInfo(). The expectation is that the same amount of device memory is available at each iteration, but in reality the amount of memory decreases and after long enough, this test would lead to a cudaMalloc failure due to running out of the device memory (thus I call it a memory leak).

Reproducer is compiled and run with
```shell
nvcc -O3 -DUSE_MPI -DUSE_OPENMPI -DCOUNT=20971520 repro_test.cu -o repro_test.out -lmpi

    mpirun  --allow-run-as-root --mca pml ^ucx --mca osc ^ucx --mca coll ^hcoll,ucc  -mca btl ^uct  \                                                
     --mca  mpi_show_mca_params enviro --mca mpi_show_mca_params_file mca_params_myfile_enviro_dbg9.txt \                                                                   
       --mca orte_base_help_aggregate 0 \                                                                                                                                   
       --mca btl_base_verbose 100 --mca mtl_base_verbose 100 \                                                                                                              
        -np 2 repro_test.out
```

Note: same reproducer also fails with UCX (which is turned off explicitly in the command above), but there I know UCX-specific workarounds and the issue is likely same as https://github.com/open-mpi/ompi/issues/12849. But for non-UCX case I am not sure if this is relevant at all.

Reproducer as *.txt:
[repro_test.txt](https://github.com/user-attachments/files/18056391/repro_test.txt)

Output example: (also contains output from `ompi_info --parsable --config` and 
[log_non_ucx2.log](https://github.com/user-attachments/files/18056392/log_non_ucx2.log)

Any suggestions?

Thanks,
Kirill


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak with CUDA-aware OpenMPI without UCX #12971

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory leak with CUDA-aware OpenMPI without UCX #12971

Description

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions