v0.9.1rc3
Pre-release
Pre-release
This is the 3rd release candidate of v0.9.1 for vLLM Ascend. Please follow the official doc to get started.
Core
- MTP supports V1 scheduler #2371
- Add LMhead TP communication groups #1956
- Fix the bug that qwen3 moe doesn't work with aclgraph #2478
- Fix
grammar_bitmaskIndexError caused by outdatedapply_grammar_bitmaskmethod #2314 - Remove
chunked_prefill_for_mla#2177 - Fix bugs and refactor cached mask generation logic #2326
- Fix configuration check logic about ascend scheduler #2327
- Cancel the verification between deepseek-mtp and non-ascend scheduler in disaggregated-prefill deployment #2368
- Fix issue that failed with ray distributed backend #2306
- Fix incorrect req block length in ascend scheduler #2394
- Fix header include issue in rope #2398
- Fix mtp config bug #2412
- Fix error info and adapt
attn_metedatarefactor #2402 - Fix torchair runtime errror caused by configuration mismtaches and
.kv_cache_bytesfile missing #2312 - Move
with_prefillallreduce from cpu to npu #2230
Docs
- Add document for deepseek large EP #2339
Known Issues
- Full graph mode support are not yet available for some case with
full_cuda_graphenable. #2182
Full Changelog: v0.9.1rc2...v0.9.1rc3