-
Notifications
You must be signed in to change notification settings - Fork 911
Closed
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v4.1.7
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
From tarball openmpi-4.1.7.tar.bz2
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
Please describe the system on which you are running
- Operating system/version: rocky8
- Computer hardware: intel x2695
- Network type: infiniband
Details of the problem
I am attempting to compile openmpi with CUDA v11.8 support, like this:
./configure' '--prefix=/lscratch/43997378/openmpi/4.1.7/CUDA-11.8/pmix-3.2.3/ucx-1.17.0/gcc-8.5.0' '--enable-shared' '--enable-static' '--without-verbs' '--without-mxm' '--enable-orterun-prefix-by-default' '--enable-mpi-cxx' '--with-libevent=/usr/local/libevent/libevent-2.1.12/gcc-11.3.0' '--with-ucx=/usr/local/ucx/1.17.0-nocuda-mofed4.9-6/gcc-8.5.0' '--with-pmix=/usr/local/apps/PMIx/pmix-3.2.3' '--with-slurm' '--with-cuda=/usr/local/CUDA/11.8.0' 'CC=/usr/local/GCC/8.5.0/bin/gcc' 'CXX=/usr/local/GCC/8.5.0/bin/g++' 'CXXFLAGS=-fabi-version=13 -fabi-compat-version=2 -fpermissive' 'FC=/usr/local/GCC/8.5.0/bin/gfortran' 'CPPFLAGS= -I/usr/local/libevent/libevent-2.1.12/gcc-11.3.0/include' --cache-file=/dev/null --srcdir=. --disable-option-checking
This results in the following error:
make[2]: Entering directory '/usr/local/src/openmpi/openmpi-4.1.7/opal/mca/common/cuda'
CC common_cuda.lo
LN_S libmca_common_cuda.la
common_cuda.c: In function ‘mca_common_cuda_get_primary_context’:
common_cuda.c:1825:21: error: ‘cudaFunctionTable_t’ {aka ‘struct cudaFunctionTable’} has no member named ‘cuDevicePrimaryCtxGetState’
result = cuFunc.cuDevicePrimaryCtxGetState(dev_id, &flags, &active);
^
common_cuda.c:1831:24: error: ‘cudaFunctionTable_t’ {aka ‘struct cudaFunctionTable’} has no member named ‘cuDevicePrimaryCtxRetain’
result = cuFunc.cuDevicePrimaryCtxRetain(pctx, dev_id);
^
make[2]: *** [Makefile:1948: common_cuda.lo] Error 1
make[2]: Leaving directory '/usr/local/src/openmpi/openmpi-4.1.7/opal/mca/common/cuda'
make[1]: *** [Makefile:2387: all-recursive] Error 1
make[1]: Leaving directory '/usr/local/src/openmpi/openmpi-4.1.7/opal'
make: *** [Makefile:1905: all-recursive] Error 1
Please note that this does not happen with v4.1.6. Something has changed with openmpi-4.1.{6,7}/opal/mca/common/cuda/common_cuda.c.
Has anyone else seen this?
Thanks, David