Release v0.9.1rc1 · vllm-project/vllm-ascend

This is the 1st release candidate of v0.9.1 for vLLM Ascend. Please follow the official doc to get started.

Experimental

Atlas 300I series is experimental supported in this release (Functional test passed with Qwen2.5-7b-instruct/Qwen2.5-0.5b/Qwen3-0.6B/Qwen3-4B/Qwen3-8B). #1333
Support EAGLE-3 for speculative decoding. #1032

After careful consideration, above features will NOT be included in v0.9.1-dev branch (v0.9.1 final release) taking into account the v0.9.1 release quality and the feature rapid iteration. We will improve this from 0.9.2rc1 and later.

Core

Ascend PyTorch adapter (torch_npu) has been upgraded to 2.5.1.post1.dev20250528. Don’t forget to update it in your environment. #1235
Support Atlas 300I series container image. You can get it from quay.io
Fix token-wise padding mechanism to make multi-card graph mode work. #1300
Upgrade vLLM to 0.9.1 [#1165]#1165

Other Improvements

Initial support Chunked Prefill for MLA. #1172
An example of best practices to run DeepSeek with ETP has been added. #1101
Performance improvements for DeepSeek using the TorchAir graph. #1098, #1131
Supports the speculative decoding feature with AscendScheduler. #943
Improve VocabParallelEmbedding custom op performance. It will be enabled in the next release. #796
Fixed a device discovery and setup bug when running vLLM Ascend on Ray #884
DeepSeek with MC2 (Merged Compute and Communication) now works properly. #1268
Fixed log2phy NoneType bug with static EPLB feature. #1186
Improved performance for DeepSeek with DBO enabled. #997, #1135
Refactoring AscendFusedMoE #1229
Add initial user stories page (include LLaMA-Factory/TRL/verl/MindIE Turbo/GPUStack) #1224
Add unit test framework #1201

Known Issues

In some cases, the vLLM process may crash with a GatherV3 error when aclgraph is enabled. We are working on this issue and will fix it in the next release. #1038
Prefix cache feature does not work with the Ascend Scheduler but without chunked prefill enabled. This will be fixed in the next release. #1350

Full Changelog

v0.9.0rc2...v0.9.1rc1

New Contributors

@farawayboat made their first contribution in #1333
@yzim made their first contribution in #1159
@chenwaner made their first contribution in #1098
@wangyanhui-cmss made their first contribution in #1184
@songshanhu07 made their first contribution in #1186
@yuancaoyaoHW made their first contribution in #1032

Full Changelog: v0.9.0rc2...v0.9.1rc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.9.1rc1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Experimental

Core

Other Improvements

Known Issues

Full Changelog

New Contributors

Contributors

Uh oh!