v0.9.1rc3

Pre-release

Pre-release

Yikun released this 22 Aug 10:48

· 33 commits to v0.9.1-dev since this release

763ed69

This is the 3rd release candidate of v0.9.1 for vLLM Ascend. Please follow the official doc to get started.

Core

MTP supports V1 scheduler #2371
Add LMhead TP communication groups #1956
Fix the bug that qwen3 moe doesn't work with aclgraph #2478
Fix grammar_bitmask IndexError caused by outdated apply_grammar_bitmask method #2314
Remove chunked_prefill_for_mla #2177
Fix bugs and refactor cached mask generation logic #2326
Fix configuration check logic about ascend scheduler #2327
Cancel the verification between deepseek-mtp and non-ascend scheduler in disaggregated-prefill deployment #2368
Fix issue that failed with ray distributed backend #2306
Fix incorrect req block length in ascend scheduler #2394
Fix header include issue in rope #2398
Fix mtp config bug #2412
Fix error info and adapt attn_metedata refactor #2402
Fix torchair runtime errror caused by configuration mismtaches and .kv_cache_bytes file missing #2312
Move with_prefill allreduce from cpu to npu #2230

Docs

Add document for deepseek large EP #2339

Known Issues

Full graph mode support are not yet available for some case with full_cuda_graph enable. #2182

Full Changelog: v0.9.1rc2...v0.9.1rc3

Assets 2