v0.9.1rc1
Pre-release
Pre-release
This is the 1st release candidate of v0.9.1 for vLLM Ascend. Please follow the official doc to get started.
Experimental
- Atlas 300I series is experimental supported in this release (Functional test passed with Qwen2.5-7b-instruct/Qwen2.5-0.5b/Qwen3-0.6B/Qwen3-4B/Qwen3-8B). #1333
- Support EAGLE-3 for speculative decoding. #1032
After careful consideration, above features will NOT be included in v0.9.1-dev branch (v0.9.1 final release) taking into account the v0.9.1 release quality and the feature rapid iteration. We will improve this from 0.9.2rc1 and later.
Core
- Ascend PyTorch adapter (torch_npu) has been upgraded to
2.5.1.post1.dev20250528. Don’t forget to update it in your environment. #1235 - Support Atlas 300I series container image. You can get it from quay.io
- Fix token-wise padding mechanism to make multi-card graph mode work. #1300
- Upgrade vLLM to 0.9.1 [#1165]#1165
Other Improvements
- Initial support Chunked Prefill for MLA. #1172
- An example of best practices to run DeepSeek with ETP has been added. #1101
- Performance improvements for DeepSeek using the TorchAir graph. #1098, #1131
- Supports the speculative decoding feature with AscendScheduler. #943
- Improve
VocabParallelEmbeddingcustom op performance. It will be enabled in the next release. #796 - Fixed a device discovery and setup bug when running vLLM Ascend on Ray #884
- DeepSeek with MC2 (Merged Compute and Communication) now works properly. #1268
- Fixed log2phy NoneType bug with static EPLB feature. #1186
- Improved performance for DeepSeek with DBO enabled. #997, #1135
- Refactoring AscendFusedMoE #1229
- Add initial user stories page (include LLaMA-Factory/TRL/verl/MindIE Turbo/GPUStack) #1224
- Add unit test framework #1201
Known Issues
- In some cases, the vLLM process may crash with a GatherV3 error when aclgraph is enabled. We are working on this issue and will fix it in the next release. #1038
- Prefix cache feature does not work with the Ascend Scheduler but without chunked prefill enabled. This will be fixed in the next release. #1350
Full Changelog
New Contributors
- @farawayboat made their first contribution in #1333
- @yzim made their first contribution in #1159
- @chenwaner made their first contribution in #1098
- @wangyanhui-cmss made their first contribution in #1184
- @songshanhu07 made their first contribution in #1186
- @yuancaoyaoHW made their first contribution in #1032
Full Changelog: v0.9.0rc2...v0.9.1rc1