[Feature] Optimize forward metadata collection across dp ranks #1593

jianzs · 2025-07-02T11:55:40Z

What this PR does / why we need it?

This PR introduces two optimizations for cases where data parallel size > 1:

Eliminates DP communication in set_forward_context
Implements HCCL for DP metadata communication, resulting in significant performance improvements for large DP configurations
- Achieves ~20ms latency reduction with DP size of 64

Does this PR introduce any user-facing change?

no

How was this patch tested?

CI passed.

vLLM version: v0.9.2
vLLM main: vllm-project/vllm@91b3d19

jianzs · 2025-07-02T11:58:04Z

@NeverRaR PTAL

vllm_ascend/worker/model_runner_v1.py

NeverRaR · 2025-07-02T12:18:12Z

lgtm

Copilot

Pull Request Overview

This PR optimizes how forward-pass metadata is collected and communicated across data-parallel ranks by removing the previous all-reduce and introducing an HCCL-based all-gather approach.

Enforce that dummy batch execution only runs under data parallelism and refactor execute_dummy_batch to use per-rank metadata.
Replace dist.all_reduce with HCCL all_gather in _get_forward_metadata_across_dp and update callers to handle the Tensor of per-rank token counts.
Propagate num_tokens_across_dp through dummy runs and forward contexts, masking sentinel values before the pass.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
vllm_ascend/worker/worker_v1.py	Added assertion for `dp_size > 1`, refactored dummy-run logic to use HCCL per-rank metadata.
vllm_ascend/worker/model_runner_v1.py	Swapped `all_reduce` for `all_gather` under `get_dp_group()`, changed method signature and updated callers to handle a Tensor of metadata.

Comments suppressed due to low confidence (1)

vllm_ascend/worker/model_runner_v1.py:622

Add unit or integration tests covering the dp_size > 1 aggregation path to verify that all_gather produces the correct combined metadata and that the masked_fill_ logic correctly replaces sentinel values.

            local_forward_metadata)

vllm_ascend/worker/worker_v1.py

codecov · 2025-07-03T12:13:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.93%. Comparing base (c30ddb8) to head (d35cbf3).
Report is 141 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1593       +/-   ##
===========================================
+ Coverage   27.39%   54.93%   +27.53%     
===========================================
  Files          56       80       +24     
  Lines        6191     9712     +3521     
===========================================
+ Hits         1696     5335     +3639     
+ Misses       4495     4377      -118

Flag	Coverage Δ
unittests	`54.93% <ø> (+27.53%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jianzs · 2025-07-04T03:13:45Z

@Yikun @wangxiyuan @ApsarasX @ganyi1996ppo ready to merge.

vllm_ascend/worker/model_runner_v1.py

Angazenn · 2025-07-04T10:23:44Z

vllm_ascend/worker/worker_v1.py

                max_num_tokens)
-        runner._dummy_run(max_num_tokens,
+        else:
+            num_tokens = 1


why is it 1?

If graph mode is off, a dummy run only needs to be executed; computational requirements are not a factor.

jianzs · 2025-07-08T06:53:21Z

@Angazenn PTAL

jianzs · 2025-07-08T14:01:25Z

@wangxiyuan @ganyi1996ppo @Yikun ready to merge.

wangxiyuan · 2025-07-09T01:51:53Z

torchair has made the code more and more complex and hard to maintain, I have a PR(#1661) to add torchair module, all torchair related code can be updated and changed there, I'll make the PR avaliable for review soon. Before that, I really don't want to merge anything about torchair change. Because it's very hard to review(I'm not sure if the change breaked anything else), Sorry.

github-actions · 2025-07-09T06:34:35Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

…rker Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

Co-authored-by: Angazenn <92204292+Angazenn@users.noreply.github.com> Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

Yikun · 2025-07-17T11:18:31Z

Wrong to submit the branch, pls feel free to open new PR.

NeverRaR suggested changes Jul 2, 2025

View reviewed changes

vllm_ascend/worker/model_runner_v1.py Show resolved Hide resolved

NeverRaR approved these changes Jul 2, 2025

View reviewed changes

jianzs requested review from Yikun, wangxiyuan, ApsarasX and ganyi1996ppo July 2, 2025 12:23

jianzs added ready read for review and removed ready read for review labels Jul 2, 2025

jianzs requested a review from Copilot July 2, 2025 14:51

This comment was marked as outdated.

Sign in to view

jianzs requested a review from Copilot July 2, 2025 14:57

Copilot AI reviewed Jul 2, 2025

View reviewed changes

vllm_ascend/worker/worker_v1.py Outdated Show resolved Hide resolved

vllm_ascend/worker/worker_v1.py Show resolved Hide resolved

jianzs requested a review from NeverRaR July 3, 2025 02:37

jianzs force-pushed the feat/dp-comm-opt branch from 5d90031 to f1ddce2 Compare July 3, 2025 11:54

jianzs added performance-test enable performance test for PR ready-for-test start test by label for PR labels Jul 3, 2025

Angazenn reviewed Jul 4, 2025

View reviewed changes

vllm_ascend/worker/model_runner_v1.py Outdated Show resolved Hide resolved

jianzs added the ready read for review label Jul 4, 2025

Angazenn reviewed Jul 4, 2025

View reviewed changes

jianzs force-pushed the feat/dp-comm-opt branch 4 times, most recently from dbaad95 to 7726146 Compare July 8, 2025 06:51

ApsarasX approved these changes Jul 9, 2025

View reviewed changes

github-actions bot added merge-conflicts and removed ready read for review labels Jul 9, 2025

jianzs added 10 commits July 15, 2025 13:10

feat: optimize forward metadata collection across dp ranks

b30878a

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

refactor: remove unused imports from model_runner_v1.py

e160939

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix: correct handling the num_tokens for dummy run

dc9a0de

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

chore: lint

91b49e3

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix: improve handling of max_num_tokens

ad1e341

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix: update dummy run batch size handling

83b9dca

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix: add assertion for num_tokens_across_dp in NPUModelRunner

41f905e

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix: change assertion to exception for dummy batch execution in NPUWo…

82c53b2

…rker Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

chore: lint

bc3b360

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

Update vllm_ascend/worker/model_runner_v1.py

d35cbf3

Co-authored-by: Angazenn <92204292+Angazenn@users.noreply.github.com> Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

jianzs force-pushed the feat/dp-comm-opt branch from 7726146 to d35cbf3 Compare July 15, 2025 05:12

github-actions bot removed the merge-conflicts label Jul 15, 2025

Yikun closed this Jul 17, 2025

Yikun deleted the feat/dp-comm-opt branch July 17, 2025 11:18

jianzs mentioned this pull request Jul 17, 2025

[Feature] Optimize forward metadata collection across dp ranks #1857

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Optimize forward metadata collection across dp ranks #1593

[Feature] Optimize forward metadata collection across dp ranks #1593

Uh oh!

jianzs commented Jul 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

jianzs commented Jul 2, 2025

Uh oh!

Uh oh!

NeverRaR commented Jul 2, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 3, 2025 •

edited

Loading

Uh oh!

jianzs commented Jul 4, 2025

Uh oh!

Uh oh!

Angazenn Jul 4, 2025

Uh oh!

jianzs Jul 4, 2025

Uh oh!

jianzs commented Jul 8, 2025

Uh oh!

jianzs commented Jul 8, 2025

Uh oh!

wangxiyuan commented Jul 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

Yikun commented Jul 17, 2025

Uh oh!

Uh oh!

[Feature] Optimize forward metadata collection across dp ranks #1593

[Feature] Optimize forward metadata collection across dp ranks #1593

Uh oh!

Conversation

jianzs commented Jul 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

jianzs commented Jul 2, 2025

Uh oh!

Uh oh!

NeverRaR commented Jul 2, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jianzs commented Jul 4, 2025

Uh oh!

Uh oh!

Angazenn Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

jianzs Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

jianzs commented Jul 8, 2025

Uh oh!

jianzs commented Jul 8, 2025

Uh oh!

wangxiyuan commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

Yikun commented Jul 17, 2025

Uh oh!

Uh oh!

jianzs commented Jul 2, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jul 3, 2025 •

edited

Loading

wangxiyuan commented Jul 9, 2025 •

edited

Loading