-
-
Notifications
You must be signed in to change notification settings - Fork 791
Description
System Info
platform: x86_64, Ubuntu 24.04
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.3 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
Python version: Python 3.13.3
CPU: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
GPU: Hardware: NVIDIA RTX5060
CUDA Toolkit: 13.0.88
CMake: 3.28.3
GCC: 13.3.0
bitsandbytes: current HEAD c3b8de2
Reproduction
When building Bitsandbytes on Ubuntu 24.04 with CUDA 13.0.88 using CMake version 3.23.0 or higher, CMake attempts to compile for Maxwell, Pascal, and Volta architectures, which are no longer supported in CUDA 13.
This leads to the following compilation error:
$ git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git
$ cd bitsandbytes
$ cmake -DCOMPUTE_BACKEND=cuda -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc -S .
-- The CXX compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring bitsandbytes (Backend: cuda)
-- The CUDA compiler identification is NVIDIA 13.0.88
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-13.0/targets/x86_64-linux/include (found version "13.0.88")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CUDA Version: 130 (13.0.88)
-- CUDA Compiler: /usr/local/cuda-13.0/bin/nvcc
-- CMAKE_CUDA_COMPILER_VERSION: 13.0.88
-- CMAKE_VERSION: 3.28.3
-- CUDA Capabilities Available: 50;52;53;60;61;62;70;72;75;80;86;87;89;90
-- CUDA Capabilities Selected: 50;52;53;60;61;62;70;72;75;80;86;87;89;90
-- CUDA Targets: 50-real;52-real;53-real;60-real;61-real;62-real;70-real;72-real;75-real;80-real;86-real;87-real;89-real;90
-- CUDA NVCC Flags: --use_fast_math
-- Configuring done (5.4s)
-- Generating done (0.0s)
-- Build files have been written to: /home/dhsung/temp/bitsandbytes
$ make
[ 14%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/common.cpp.o
[ 28%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/cpu_ops.cpp.o
[ 42%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/pythonInterface.cpp.o
[ 57%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/ops.cu.o
nvcc fatal : Unsupported gpu architecture 'compute_50'
make[2]: *** [CMakeFiles/bitsandbytes.dir/build.make:119: CMakeFiles/bitsandbytes.dir/csrc/ops.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/bitsandbytes.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
Expected behavior
I checked the current CMakeList.txt file
Below the CMakeList.txt https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/CMakeLists.txt#L123-L124
For CMake 3.23 or later, CUDA architectures should be automatically determined based on the detected CUDA version.
In this case(CUDA 13), Maxwell (5.x), Pascal (6.x), and Volta (7.0/7.2) should be excluded as these are no longer supported by CUDA 13.
However, CMake still selected these architectures, leading to the unsupported
CUDA Capabilities Selected: 50;52;53;60;61;62;70;72;
50-real;52-real;53-real;60-real;61-real;62-real;70-real;72-real;
targets.
Expected build
$ git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git
$ cd bitsandbytes
$ cmake -DCOMPUTE_BACKEND=cuda -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc -S .
-- The CXX compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring bitsandbytes (Backend: cuda)
-- The CUDA compiler identification is NVIDIA 13.0.88
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-13.0/targets/x86_64-linux/include (found version "13.0.88")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CUDA Version: 130 (13.0.88)
-- CUDA Compiler: /usr/local/cuda-13.0/bin/nvcc
-- CMAKE_CUDA_COMPILER_VERSION: 13.0.88
-- CMAKE_VERSION: 3.28.3
-- CUDA Capabilities Available: 50;52;53;60;61;62;70;72;75;80;86;87;89;90
-- CUDA Capabilities Selected: 75;80;86;87;89;90;100;103;110;120;121
-- CUDA Targets: 75-real;80-real;86-real;87-real;89-real;90-real;100-real;103-real;110-real;120-real;121
-- CUDA NVCC Flags: --use_fast_math
-- Configuring done (5.4s)
-- Generating done (0.0s)
-- Build files have been written to: /home/dhsung/temp/bitsandbytes
$ make
[ 14%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/common.cpp.o
[ 28%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/cpu_ops.cpp.o
[ 42%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/pythonInterface.cpp.o
[ 57%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/ops.cu.o
[ 71%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/kernels.cu.o
[ 85%] Linking CUDA device code CMakeFiles/bitsandbytes.dir/cmake_device_link.o
[100%] Linking CXX shared library bitsandbytes/libbitsandbytes_cuda130.so
[100%] Built target bitsandbytes
I checked CMakeList.txt's build option -DCOMPUTE_CAPABILITY=
, so I can build bitsandbytes
on CUDA 13.
The command below is my current bitsandbytes build command.
$ git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git
$ cd bitsandbytes
$ cmake -DCOMPUTE_BACKEND=cuda -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc -DCOMPUTE_CAPABILITY="75;80;86;87;89;90;100;103;110;120;121" -S .
-- The CXX compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring bitsandbytes (Backend: cuda)
-- The CUDA compiler identification is NVIDIA 13.0.88
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-13.0/targets/x86_64-linux/include (found version "13.0.88")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CUDA Version: 130 (13.0.88)
-- CUDA Compiler: /usr/local/cuda-13.0/bin/nvcc
-- CMAKE_CUDA_COMPILER_VERSION: 13.0.88
-- CMAKE_VERSION: 3.28.3
-- CUDA Capabilities Available: 50;52;53;60;61;62;70;72;75;80;86;87;89;90
-- CUDA Capabilities Selected: 75;80;86;87;89;90;100;103;110;120;121
-- CUDA Targets: 75-real;80-real;86-real;87-real;89-real;90-real;100-real;103-real;110-real;120-real;121
-- CUDA NVCC Flags: --use_fast_math
-- Configuring done (5.4s)
-- Generating done (0.0s)
-- Build files have been written to: /home/dhsung/temp/bitsandbytes
$ make
[ 14%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/common.cpp.o
[ 28%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/cpu_ops.cpp.o
[ 42%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/pythonInterface.cpp.o
[ 57%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/ops.cu.o
[ 71%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/kernels.cu.o
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi1EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi0EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi1EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi0EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi1EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi0EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi1EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi0EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi1EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z16kInt8VectorQuantI6__halfLi1024ELi0EEvPT_PaPffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi1EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z12kgetRowStatsI6__halfLi1024ELi0EEvPT_Pffii is out of range. .minnctapersm will be ignored
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
[ 85%] Linking CUDA device code CMakeFiles/bitsandbytes.dir/cmake_device_link.o
[100%] Linking CXX shared library bitsandbytes/libbitsandbytes_cuda130.so
[100%] Built target bitsandbytes