Release v0.9.2rc1 · vllm-project/vllm-ascend

This is the 1st release candidate of v0.9.2 for vLLM Ascend. Please follow the official doc to get started. From this release, V1 engine will be enabled by default, there is no need to set VLLM_USE_V1=1 any more. And this release is the last version to support V0 engine, V0 code will be clean up in the future.

Highlights

Pooling model works with V1 engine now. You can take a try with Qwen3 embedding model #1359.
The performance on Atlas 300I series has been improved. #1591
aclgraph mode works with Moe models now. Currently, only Qwen3 Moe is well tested. #1381

Core

Ascend PyTorch adapter (torch_npu) has been upgraded to 2.5.1.post1.dev20250619. Don’t forget to update it in your environment. #1347
The GatherV3 error has been fixed with aclgraph mode. #1416
W8A8 quantization works on Atlas 300I series now. #1560
Fix the accuracy problem with deploy models with parallel parameters. #1678
The pre-built wheel package now requires lower version of glibc. Users can use it by pip install vllm-ascend directly. #1582

Other

Official doc has been updated for better read experience. For example, more deployment tutorials are added, user/developer docs are updated. More guide will coming soon.
Fix accuracy problem for deepseek V3/R1 models with torchair graph in long sequence predictions. #1331
A new env variable VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP has been added. It enables the fused allgather-experts kernel for Deepseek V3/R1 models. The default value is 0. #1335
A new env variable VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION has been added to improve the performance of topk-topp sampling. The default value is 0, we'll consider to enable it by default in the future#1732
A batch of bugs have been fixed for Data Parallelism case #1273 #1322 #1275 #1478
The DeepSeek performance has been improved. #1194 #1395 #1380
Ascend scheduler works with prefix cache now. #1446
DeepSeek now works with prefix cache now. #1498
Support prompt logprobs to recover ceval accuracy in V1 #1483

Knowissue

Pipeline parallel does not work with ray and graph mode: #1751 #1754

New Contributors

@xleoken made their first contribution in #1357
@lyj-jjj made their first contribution in #1335
@sharonyunyun made their first contribution in #1194
@Pr0Wh1teGivee made their first contribution in #1308
@leo-pony made their first contribution in #1374
@zeshengzong made their first contribution in #1452
@GDzhu01 made their first contribution in #1477
@Agonixiaoxiao made their first contribution in #1531
@zhanghw0354 made their first contribution in #1476
@farawayboat made their first contribution in #1591
@ZhengWG made their first contribution in #1196
@wm901115nwpu made their first contribution in #1654

Full Changelog: v0.9.1rc1...v0.9.2rc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.9.2rc1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Core

Other

Knowissue

New Contributors

Contributors

Uh oh!