Skip to content

v0.9.1rc1

Pre-release
Pre-release

Choose a tag to compare

@Yikun Yikun released this 22 Jun 07:08
· 1051 commits to main since this release
c30ddb8

This is the 1st release candidate of v0.9.1 for vLLM Ascend. Please follow the official doc to get started.

Experimental

  • Atlas 300I series is experimental supported in this release (Functional test passed with Qwen2.5-7b-instruct/Qwen2.5-0.5b/Qwen3-0.6B/Qwen3-4B/Qwen3-8B). #1333
  • Support EAGLE-3 for speculative decoding. #1032

After careful consideration, above features will NOT be included in v0.9.1-dev branch (v0.9.1 final release) taking into account the v0.9.1 release quality and the feature rapid iteration. We will improve this from 0.9.2rc1 and later.

Core

  • Ascend PyTorch adapter (torch_npu) has been upgraded to 2.5.1.post1.dev20250528. Don’t forget to update it in your environment. #1235
  • Support Atlas 300I series container image. You can get it from quay.io
  • Fix token-wise padding mechanism to make multi-card graph mode work. #1300
  • Upgrade vLLM to 0.9.1 [#1165]#1165

Other Improvements

  • Initial support Chunked Prefill for MLA. #1172
  • An example of best practices to run DeepSeek with ETP has been added. #1101
  • Performance improvements for DeepSeek using the TorchAir graph. #1098, #1131
  • Supports the speculative decoding feature with AscendScheduler. #943
  • Improve VocabParallelEmbedding custom op performance. It will be enabled in the next release. #796
  • Fixed a device discovery and setup bug when running vLLM Ascend on Ray #884
  • DeepSeek with MC2 (Merged Compute and Communication) now works properly. #1268
  • Fixed log2phy NoneType bug with static EPLB feature. #1186
  • Improved performance for DeepSeek with DBO enabled. #997, #1135
  • Refactoring AscendFusedMoE #1229
  • Add initial user stories page (include LLaMA-Factory/TRL/verl/MindIE Turbo/GPUStack) #1224
  • Add unit test framework #1201

Known Issues

  • In some cases, the vLLM process may crash with a GatherV3 error when aclgraph is enabled. We are working on this issue and will fix it in the next release. #1038
  • Prefix cache feature does not work with the Ascend Scheduler but without chunked prefill enabled. This will be fixed in the next release. #1350

Full Changelog

v0.9.0rc2...v0.9.1rc1

New Contributors

Full Changelog: v0.9.0rc2...v0.9.1rc1