Skip to content

v0.9.2rc1

Pre-release
Pre-release

Choose a tag to compare

@wangxiyuan wangxiyuan released this 11 Jul 09:51
· 916 commits to main since this release
b5b7e0e

This is the 1st release candidate of v0.9.2 for vLLM Ascend. Please follow the official doc to get started. From this release, V1 engine will be enabled by default, there is no need to set VLLM_USE_V1=1 any more. And this release is the last version to support V0 engine, V0 code will be clean up in the future.

Highlights

  • Pooling model works with V1 engine now. You can take a try with Qwen3 embedding model #1359.
  • The performance on Atlas 300I series has been improved. #1591
  • aclgraph mode works with Moe models now. Currently, only Qwen3 Moe is well tested. #1381

Core

  • Ascend PyTorch adapter (torch_npu) has been upgraded to 2.5.1.post1.dev20250619. Don’t forget to update it in your environment. #1347
  • The GatherV3 error has been fixed with aclgraph mode. #1416
  • W8A8 quantization works on Atlas 300I series now. #1560
  • Fix the accuracy problem with deploy models with parallel parameters. #1678
  • The pre-built wheel package now requires lower version of glibc. Users can use it by pip install vllm-ascend directly. #1582

Other

  • Official doc has been updated for better read experience. For example, more deployment tutorials are added, user/developer docs are updated. More guide will coming soon.
  • Fix accuracy problem for deepseek V3/R1 models with torchair graph in long sequence predictions. #1331
  • A new env variable VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP has been added. It enables the fused allgather-experts kernel for Deepseek V3/R1 models. The default value is 0. #1335
  • A new env variable VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION has been added to improve the performance of topk-topp sampling. The default value is 0, we'll consider to enable it by default in the future#1732
  • A batch of bugs have been fixed for Data Parallelism case #1273 #1322 #1275 #1478
  • The DeepSeek performance has been improved. #1194 #1395 #1380
  • Ascend scheduler works with prefix cache now. #1446
  • DeepSeek now works with prefix cache now. #1498
  • Support prompt logprobs to recover ceval accuracy in V1 #1483

Knowissue

  • Pipeline parallel does not work with ray and graph mode: #1751 #1754

New Contributors

Full Changelog: v0.9.1rc1...v0.9.2rc1