[V0.9.1] Add support for flashcomm_v1 in Qwen2.5 #1745

rjg-lyh · 2025-07-11T07:45:43Z

What this PR does / why we need it?

Add support for flashcomm_v1 in Qwen2.5.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

①Functional Testing: CI passed with existing test and add new test in tests\multicard\test_offline_inference_distributed.py.
②Accuracy Testing: Using offline_inference: Evaluate the accuracy difference in model outputs between enabling and disabling the FlashComm v1 feature using offline inference. As shown in the figure below：

disabling

- enabling

③Performance Stress Testing: Here's the comparison of TTFT time, based on QwQ-32B-BF16, input_len=16K~32K, output_len=8K, and max_concurrency=16:

disabling
Mean TTFT (ms): 1419.58
Median TTFT (ms): 1073.32
P99 TTFT (ms): 9549.34
enabling
Mean TTFT (ms): 1322.36
Median TTFT (ms): 1006.09
P99 TTFT (ms): 8268.28

github-actions · 2025-07-14T06:31:27Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

MengqingCao · 2025-07-15T06:57:42Z

vllm_ascend/__init__.py

@@ -27,5 +27,10 @@ def register_model():
    # is upgraded to 2.7.0
    import vllm_ascend.patch.worker.patch_common.patch_utils  # noqa: F401

+    from .utils import vllm_version_is
+    # Import specific patches for different versions
+    if vllm_version_is("0.9.1"):


I think it is not necessary to check the vllm version, because branch 0.9.1-dev is only compatible with vllm 0.9.1

You are right and this change would be better.

MengqingCao · 2025-07-15T06:58:04Z

vllm_ascend/envs.py

@@ -106,6 +106,8 @@
    "VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE":
    lambda: bool(int(os.getenv("VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE", '0'))
                 ),
+    "VLLM_ENABLE_FlashComm":


Suggested change

"VLLM_ENABLE_FlashComm":

"VLLM_ASCEND_ENABLE_FLASHCOMM":

Thanks, I have changed it.

ApsarasX · 2025-07-16T02:45:09Z

Do you have any performance data to share?

Signed-off-by: rjg-lyh <1318825571@qq.com>

rjg-lyh · 2025-07-16T09:51:38Z

Do you have any performance data to share?
Based on QwQ-32B-BF16, here's the comparison of TTFT time when input_len=16K~32K, output_len=8K, and max_concurrency=16:
origin:
Mean TTFT (ms): 1419.58
Median TTFT (ms): 1073.32
P99 TTFT (ms): 9549.34
with flashcomm1:
Mean TTFT (ms): 1322.36
Median TTFT (ms): 1006.09
P99 TTFT (ms): 8268.28

Signed-off-by: rjg-lyh <1318825571@qq.com>

### What this PR does / why we need it? Add support for flashcomm_v1 in Qwen2.5. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? **①Functional Testing**: CI passed with existing test and add new test in tests\multicard\test_offline_inference_distributed.py. **②Accuracy Testing**: Using offline_inference: Evaluate the accuracy difference in model outputs between enabling and disabling the FlashComm v1 feature using offline inference. As shown in the figure below： - disabling <img width="1543" height="358" alt="image" src="https://github.com/user-attachments/assets/f7fab4e3-c3d1-412a-958e-11e2b9ec8f58" /> - enabling <img width="1541" height="531" alt="image" src="https://github.com/user-attachments/assets/11a2c5bf-22f0-4a63-b76d-c7b7575397be" /> **③Performance Stress Testing**: Here's the comparison of TTFT time, based on QwQ-32B-BF16, input_len=16K~32K, output_len=8K, and max_concurrency=16: - disabling Mean TTFT (ms): 1419.58 Median TTFT (ms): 1073.32 P99 TTFT (ms): 9549.34 - enabling Mean TTFT (ms): 1322.36 Median TTFT (ms): 1006.09 P99 TTFT (ms): 8268.28 --------- Signed-off-by: rjg-lyh <1318825571@qq.com> Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot added module:ops module:core labels Jul 11, 2025

rjg-lyh force-pushed the pr-flashcommv1 branch 3 times, most recently from 7465c21 to 30858dc Compare July 14, 2025 03:19

github-actions bot removed the module:ops label Jul 14, 2025

rjg-lyh force-pushed the pr-flashcommv1 branch 2 times, most recently from 5a85a7f to 8237ec0 Compare July 14, 2025 03:43

github-actions bot added the merge-conflicts label Jul 14, 2025

rjg-lyh force-pushed the pr-flashcommv1 branch 4 times, most recently from e026e51 to e19ed68 Compare July 15, 2025 06:45

MengqingCao reviewed Jul 15, 2025

View reviewed changes

rjg-lyh force-pushed the pr-flashcommv1 branch 5 times, most recently from 4e9d3e6 to 6b7da38 Compare July 15, 2025 10:48

github-actions bot removed the merge-conflicts label Jul 15, 2025

rjg-lyh force-pushed the pr-flashcommv1 branch 4 times, most recently from 20c459a to 31ba211 Compare July 16, 2025 02:32

rjg-lyh force-pushed the pr-flashcommv1 branch 3 times, most recently from 4e7f241 to a2b6dbb Compare July 16, 2025 03:58

rjg-lyh force-pushed the pr-flashcommv1 branch 3 times, most recently from d192b46 to c617547 Compare July 16, 2025 06:12

[V0.9.1] Add support for flashcomm_v1 in Qwen2.5

31b891f

Signed-off-by: rjg-lyh <1318825571@qq.com>

rjg-lyh force-pushed the pr-flashcommv1 branch 4 times, most recently from 32bec8b to a294834 Compare July 16, 2025 08:14

github-actions bot added the module:tests label Jul 16, 2025

rjg-lyh force-pushed the pr-flashcommv1 branch from a294834 to 06be437 Compare July 16, 2025 09:26

[V0.9.1] Patch compilation.decorator to support flashcomm_v1 in aclgraph

c15449b

Signed-off-by: rjg-lyh <1318825571@qq.com>

rjg-lyh force-pushed the pr-flashcommv1 branch from 06be437 to c15449b Compare July 16, 2025 11:04

wangxiyuan merged commit b3d6e0c into vllm-project:v0.9.1-dev Jul 17, 2025
16 checks passed

wangxiyuan added the no-main label Jul 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[V0.9.1] Add support for flashcomm_v1 in Qwen2.5 #1745

[V0.9.1] Add support for flashcomm_v1 in Qwen2.5 #1745

rjg-lyh commented Jul 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 14, 2025

Uh oh!

MengqingCao Jul 15, 2025

Uh oh!

rjg-lyh Jul 16, 2025

Uh oh!

MengqingCao Jul 15, 2025 •

edited

Loading

Uh oh!

rjg-lyh Jul 16, 2025

Uh oh!

ApsarasX commented Jul 16, 2025

Uh oh!

rjg-lyh commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

[V0.9.1] Add support for flashcomm_v1 in Qwen2.5 #1745

[V0.9.1] Add support for flashcomm_v1 in Qwen2.5 #1745

Conversation

rjg-lyh commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jul 14, 2025

Uh oh!

MengqingCao Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

rjg-lyh Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rjg-lyh Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

ApsarasX commented Jul 16, 2025

Uh oh!

rjg-lyh commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

rjg-lyh commented Jul 11, 2025 •

edited

Loading

MengqingCao Jul 15, 2025 •

edited

Loading