v0.9.2rc1
Pre-release
Pre-release
This is the 1st release candidate of v0.9.2 for vLLM Ascend. Please follow the official doc to get started. From this release, V1 engine will be enabled by default, there is no need to set VLLM_USE_V1=1 any more. And this release is the last version to support V0 engine, V0 code will be clean up in the future.
Highlights
- Pooling model works with V1 engine now. You can take a try with Qwen3 embedding model #1359.
- The performance on Atlas 300I series has been improved. #1591
- aclgraph mode works with Moe models now. Currently, only Qwen3 Moe is well tested. #1381
Core
- Ascend PyTorch adapter (torch_npu) has been upgraded to
2.5.1.post1.dev20250619. Don’t forget to update it in your environment. #1347 - The GatherV3 error has been fixed with aclgraph mode. #1416
- W8A8 quantization works on Atlas 300I series now. #1560
- Fix the accuracy problem with deploy models with parallel parameters. #1678
- The pre-built wheel package now requires lower version of glibc. Users can use it by
pip install vllm-ascenddirectly. #1582
Other
- Official doc has been updated for better read experience. For example, more deployment tutorials are added, user/developer docs are updated. More guide will coming soon.
- Fix accuracy problem for deepseek V3/R1 models with torchair graph in long sequence predictions. #1331
- A new env variable
VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EPhas been added. It enables the fused allgather-experts kernel for Deepseek V3/R1 models. The default value is0. #1335 - A new env variable
VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATIONhas been added to improve the performance of topk-topp sampling. The default value is 0, we'll consider to enable it by default in the future#1732 - A batch of bugs have been fixed for Data Parallelism case #1273 #1322 #1275 #1478
- The DeepSeek performance has been improved. #1194 #1395 #1380
- Ascend scheduler works with prefix cache now. #1446
- DeepSeek now works with prefix cache now. #1498
- Support prompt logprobs to recover ceval accuracy in V1 #1483
Knowissue
New Contributors
- @xleoken made their first contribution in #1357
- @lyj-jjj made their first contribution in #1335
- @sharonyunyun made their first contribution in #1194
- @Pr0Wh1teGivee made their first contribution in #1308
- @leo-pony made their first contribution in #1374
- @zeshengzong made their first contribution in #1452
- @GDzhu01 made their first contribution in #1477
- @Agonixiaoxiao made their first contribution in #1531
- @zhanghw0354 made their first contribution in #1476
- @farawayboat made their first contribution in #1591
- @ZhengWG made their first contribution in #1196
- @wm901115nwpu made their first contribution in #1654
Full Changelog: v0.9.1rc1...v0.9.2rc1