-
Notifications
You must be signed in to change notification settings - Fork 911
Open
Labels
Milestone
Description
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.1.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
spack installation
Please describe the system on which you are running
- Operating system/version: CentOS 8
- Computer hardware: AMD Epyc 7532 processors (32 cores per CPU, 2.4 GHz)
- Network type: N.A.
Details of the problem
This issue occurs at a machine used by E3SM (e3sm.org)
https://e3sm.org/model/running-e3sm/supported-machines/chrysalis-anl
modules used: gcc/9.2.0-ugetvbp openmpi/4.1.3-sxfyy4k
A simple MPI program is built with AddressSanitizer supported by GCC
module load gcc/9.2.0-ugetvbp openmpi/4.1.3-sxfyy4k
cat <<EOF >> test_mpi.c
#include <mpi.h>
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
MPI_Finalize();
return 0;
}
EOF
mpicc -fsanitize=address -static-libasan test_mpi.c
A SLURM job is used to run the MPI executable built above via srun. The output shows some errors detected by AddressSanitizer
=================================================================
==268787==ERROR: AddressSanitizer: heap-use-after-free on address 0x61900001f480 at pc 0x000000448c68 bp 0x7fffffffb4e0 sp 0x7fffffffac90
READ of size 2 at 0x61900001f480 thread T0
#0 0x448c67 in __interceptor_strlen /tmp/svcbuilder/spack-stage-gcc-9.2.0-ugetvbp5jl5kgy7jwjloyf73vnhhw7db/spack-src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:354
#1 0x155552d0b9a4 in opal_pmix_base_partial_commit_packed base/pmix_base_fns.c:405
#2 0x155552d0d423 in s2_put /tmp/svcbuilder/spack-stage-openmpi-4.1.3-sxfyy4knvddpewshfcc45heice7tzs7f/spack-src/opal/mca/pmix/s2/pmix_s2.c:548
#3 0x155555029640 in mca_pml_base_pml_selected base/pml_base_select.c:323
#4 0x155555029640 in mca_pml_base_pml_selected base/pml_base_select.c:318
#5 0x155555029c90 in mca_pml_base_select base/pml_base_select.c:284
#6 0x1555550736ac in ompi_mpi_init runtime/ompi_mpi_init.c:647
#7 0x155554ea80de in PMPI_Init /tmp/svcbuilder/spack-stage-openmpi-4.1.3-sxfyy4knvddpewshfcc45heice7tzs7f/spack-src/ompi/mpi/c/profile/pinit.c:67
#8 0x528ced in main (/gpfs/fs1/home/wuda/ASAN/a.out+0x528ced)
#9 0x155553e92492 in __libc_start_main (/usr/lib64/libc.so.6+0x23492)
#10 0x4069bd in _start (/gpfs/fs1/home/wuda/ASAN/a.out+0x4069bd)
0x61900001f480 is located 0 bytes inside of 1036-byte region [0x61900001f480,0x61900001f88c)
freed by thread T0 here:
#0 0x4eb7ce in __interceptor_realloc /tmp/svcbuilder/spack-stage-gcc-9.2.0-ugetvbp5jl5kgy7jwjloyf73vnhhw7db/spack-src/libsanitizer/asan/asan_malloc_linux.cc:163
#1 0x155552d0b994 in opal_pmix_base_partial_commit_packed base/pmix_base_fns.c:404
previously allocated by thread T0 here:
#0 0x4eb5ae in __interceptor_calloc /tmp/svcbuilder/spack-stage-gcc-9.2.0-ugetvbp5jl5kgy7jwjloyf73vnhhw7db/spack-src/libsanitizer/asan/asan_malloc_linux.cc:153
#1 0x155552d0a604 in pmi_encode base/pmix_base_fns.c:705
SUMMARY: AddressSanitizer: heap-use-after-free /tmp/svcbuilder/spack-stage-gcc-9.2.0-ugetvbp5jl5kgy7jwjloyf73vnhhw7db/spack-src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:354 in __interceptor_strlen
Shadow bytes around the buggy address:
0x0c327fffbe40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbe50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbe60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbe70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c327fffbe80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c327fffbe90:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbea0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbeb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbec0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbed0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c327fffbee0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==268787==ABORTING
srun: error: chr-0502: task 0: Exited with exit code 1
Comment
Not sure if this issue is still reproducible in latest Open MPI 5.0 as the related
function opal_pmix_base_partial_commit_packed shown in the stack trace has
been removed by PR #7202