Skip to content

v0.9.1rc3

Pre-release
Pre-release

Choose a tag to compare

@Yikun Yikun released this 22 Aug 10:48
· 33 commits to v0.9.1-dev since this release
763ed69

This is the 3rd release candidate of v0.9.1 for vLLM Ascend. Please follow the official doc to get started.

Core

  • MTP supports V1 scheduler #2371
  • Add LMhead TP communication groups #1956
  • Fix the bug that qwen3 moe doesn't work with aclgraph #2478
  • Fix grammar_bitmask IndexError caused by outdated apply_grammar_bitmask method #2314
  • Remove chunked_prefill_for_mla #2177
  • Fix bugs and refactor cached mask generation logic #2326
  • Fix configuration check logic about ascend scheduler #2327
  • Cancel the verification between deepseek-mtp and non-ascend scheduler in disaggregated-prefill deployment #2368
  • Fix issue that failed with ray distributed backend #2306
  • Fix incorrect req block length in ascend scheduler #2394
  • Fix header include issue in rope #2398
  • Fix mtp config bug #2412
  • Fix error info and adapt attn_metedata refactor #2402
  • Fix torchair runtime errror caused by configuration mismtaches and .kv_cache_bytes file missing #2312
  • Move with_prefill allreduce from cpu to npu #2230

Docs

  • Add document for deepseek large EP #2339

Known Issues

  • Full graph mode support are not yet available for some case with full_cuda_graph enable. #2182

Full Changelog: v0.9.1rc2...v0.9.1rc3