Support cuda 12.8.0 and SBSA wheels #677
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request includes several updates to the
publish.yaml
workflow file to enhance compatibility and support for additional architectures, along with minor version updates and copyright changes.Enhancements to the workflow file:
.github/workflows/publish.yaml
: Updated the matrix to includeubuntu-22.04-arm
foros
andaarch64
forarch
, adjusted thetorch-version
andcuda-version
lists, and added exclusions to prevent incompatible combinations..github/workflows/publish.yaml
: Added a new environment variableMATRIX_ARCH
to capture the architecture..github/workflows/publish.yaml
: Updated the actions for setting up swap space and installing CUDA to use the latest versions..github/workflows/publish.yaml
: Modified the logic for determining theTORCH_CUDA_VERSION
to include new versions and updated the hard-coded nightly build URLs.Version updates:
mamba_ssm/__init__.py
: Incremented the version from2.2.4
to2.2.5
.Copyright updates:
tensor_parallel.py
,block.py
,mamba2.py
,mamba2_simple.py
,mha.py
,mlp.py
,ssd_minimal.py
,k_activations.py
,layer_norm.py
,layernorm_gated.py
,selective_state_update.py
,ssd_bmm.py
,ssd_chunk_scan.py
,ssd_chunk_state.py
,ssd_combined.py
, andssd_state_passing.py
. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]Additional changes in
setup.py
:setup.py
: Added new imports, reorganized the code to include new functions for determining the build target and platform, and updated the logic for handling CUDA and ROCm builds. [1] [2] [3] [4]Why this PR @tridao ?
Github: https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/
windows arm q2 2025: github/roadmap#1098
ubuntu 20.04 is deprecated from today
Devices: Digits, jetson thor, cuda arm laptops are coming
Nvidia is merging SBSA and ARM64 together
I add cuda 12.8.0 and arm runners on https://github.com/Jimver/cuda-toolkit/releases/tag/v0.2.21