Skip to content

compile v4.1.7 with CUDA support broken #13005

@davidhoover

Description

@davidhoover

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v4.1.7

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From tarball openmpi-4.1.7.tar.bz2

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: rocky8
  • Computer hardware: intel x2695
  • Network type: infiniband

Details of the problem

I am attempting to compile openmpi with CUDA v11.8 support, like this:

./configure'   '--prefix=/lscratch/43997378/openmpi/4.1.7/CUDA-11.8/pmix-3.2.3/ucx-1.17.0/gcc-8.5.0' '--enable-shared' '--enable-static' '--without-verbs' '--without-mxm' '--enable-orterun-prefix-by-default' '--enable-mpi-cxx' '--with-libevent=/usr/local/libevent/libevent-2.1.12/gcc-11.3.0' '--with-ucx=/usr/local/ucx/1.17.0-nocuda-mofed4.9-6/gcc-8.5.0' '--with-pmix=/usr/local/apps/PMIx/pmix-3.2.3' '--with-slurm' '--with-cuda=/usr/local/CUDA/11.8.0' 'CC=/usr/local/GCC/8.5.0/bin/gcc' 'CXX=/usr/local/GCC/8.5.0/bin/g++' 'CXXFLAGS=-fabi-version=13 -fabi-compat-version=2 -fpermissive' 'FC=/usr/local/GCC/8.5.0/bin/gfortran' 'CPPFLAGS=    -I/usr/local/libevent/libevent-2.1.12/gcc-11.3.0/include' --cache-file=/dev/null --srcdir=. --disable-option-checking

This results in the following error:

make[2]: Entering directory '/usr/local/src/openmpi/openmpi-4.1.7/opal/mca/common/cuda'
  CC       common_cuda.lo
  LN_S     libmca_common_cuda.la
common_cuda.c: In function ‘mca_common_cuda_get_primary_context’:
common_cuda.c:1825:21: error: ‘cudaFunctionTable_t’ {aka ‘struct cudaFunctionTable’} has no member named ‘cuDevicePrimaryCtxGetState’
     result =  cuFunc.cuDevicePrimaryCtxGetState(dev_id, &flags, &active);
                     ^
common_cuda.c:1831:24: error: ‘cudaFunctionTable_t’ {aka ‘struct cudaFunctionTable’} has no member named ‘cuDevicePrimaryCtxRetain’
         result = cuFunc.cuDevicePrimaryCtxRetain(pctx, dev_id);
                        ^
make[2]: *** [Makefile:1948: common_cuda.lo] Error 1
make[2]: Leaving directory '/usr/local/src/openmpi/openmpi-4.1.7/opal/mca/common/cuda'
make[1]: *** [Makefile:2387: all-recursive] Error 1
make[1]: Leaving directory '/usr/local/src/openmpi/openmpi-4.1.7/opal'
make: *** [Makefile:1905: all-recursive] Error 1

Please note that this does not happen with v4.1.6. Something has changed with openmpi-4.1.{6,7}/opal/mca/common/cuda/common_cuda.c.

Has anyone else seen this?

Thanks, David

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions