[Feat] communication optimization for mc2 ops on A2 #2752

realliujiaxu · 2025-09-04T09:13:31Z

What this PR does / why we need it?

Currently, when in A2, setting the environment variables HCCL_INTRA_PCIE_ENABLE=1 and HCCL_INTRA_ROCE_ENABLE=0 can reduce cross-machine communication traffic and significantly improve communication performance.

For more details, please refer to document

Does this PR introduce any user-facing change?

Nope

How was this patch tested?

rm -rf .torchair_cache

export VLLM_USE_V1=1
export VLLM_VERSION=0.10.1.1
export HCCL_INTRA_PCIE_ENABLE=1
export HCCL_INTRA_ROCE_ENABLE=0

python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM \
    --quantization ascend \
    --load-format=auto \
    --served-model-name auto \
    --trust-remote-code \
    --distributed-executor-backend=mp \
    --port 8009 \
    -tp=16 \
    -dp=1 \
    --max-num-seqs 32 \
    --max-model-len 32768 \
    --max-num-batched-tokens 16384 \
    --block-size 128 \
    --enable-expert-parallel \
    --additional-config '{"torchair_graph_config":{"enabled":true,"use_cached_graph":true,"graph_batch_sizes":[32],"enable_multistream_mla": true}, "chunked_prefill_for_mla": true}' \
    --gpu-memory-utilization 0.94

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@4f02b77

gemini-code-assist

Code Review

This pull request enables hierarchical communication for mc2 operations on A2 hardware by adding checks for the Ascend A2 SOC version and specific environment variables. While the changes correctly add the necessary logic, I've identified a critical bug in token_dispatcher.py where variables are unpacked incorrectly, which will likely lead to runtime errors. Additionally, there is significant code duplication across three files that should be refactored to improve maintainability. Addressing these points will improve the correctness and quality of the code.

gemini-code-assist · 2025-09-04T09:15:01Z

vllm_ascend/ops/moe_dispatcher/token_dispatcher.py

+        expand_x, dynamic_scale, assist_info_for_combine, expert_token_nums, \
+            ep_recv_counts, _, expand_scales = self.output[0:7]


The variables assist_info_for_combine and ep_recv_counts are unpacked as local variables, but they are later used as instance attributes (self.assist_info_for_combine and self.ep_recv_counts) in get_combine_mc_kwargs. This will cause them to have their initial None value from __init__, likely leading to a NoneType error or incorrect behavior. You should assign the unpacked values directly to the instance attributes.

expand_x, dynamic_scale, self.assist_info_for_combine, expert_token_nums, self.ep_recv_counts, _, expand_scales = self.output[0:7]

gemini-code-assist · 2025-09-04T09:15:02Z

vllm_ascend/ops/moe_dispatcher/token_dispatcher.py

+        self.a2_need_extra_args = (get_ascend_soc_version() == AscendSocVersion.A2
+                                  and os.getenv("HCCL_INTRA_ROCE_ENABLE", "") == "0"
+                                  and os.getenv("HCCL_INTRA_PCIE_ENABLE", "") == "1")


This logic to determine if hierarchical communication for A2 is enabled is duplicated in three files: vllm_ascend/ops/moe_dispatcher/token_dispatcher.py, vllm_ascend/torchair/ops/torchair_fused_moe.py, and vllm_ascend/torchair/quantization/torchair_w8a8_dynamic.py. This duplication harms maintainability and makes future changes error-prone.

Consider refactoring this into a helper function in a shared utility module, such as vllm_ascend/utils.py. For example:

# in vllm_ascend/utils.py def is_a2_hierarchical_comm_enabled(): return (get_ascend_soc_version() == AscendSocVersion.A2 and os.getenv("HCCL_INTRA_ROCE_ENABLE", "") == "0" and os.getenv("HCCL_INTRA_PCIE_ENABLE", "") == "1")

You can then call this function from all three locations to remove the duplicated code.

github-actions · 2025-09-04T09:49:14Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

codecov · 2025-09-05T09:59:17Z

Codecov Report

❌ Patch coverage is 94.73684% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.95%. Comparing base (1bbb20e) to head (8a1c041).

Files with missing lines	Patch %	Lines
vllm_ascend/ops/moe/token_dispatcher.py	85.71%	1 Missing ⚠️
vllm_ascend/torchair/ops/torchair_fused_moe.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2752      +/-   ##
==========================================
+ Coverage   74.76%   74.95%   +0.18%     
==========================================
  Files         150      150              
  Lines       20891    20925      +34     
==========================================
+ Hits        15620    15684      +64     
+ Misses       5271     5241      -30

Flag	Coverage Δ
unittests	`74.95% <94.73%> (+0.18%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-09-08T12:25:10Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

momo609 · 2025-09-12T08:37:50Z

vllm_ascend/torchair/quantization/torchair_w8a8_dynamic.py

+    # NOTE: When in A2, setting the environment variables HCCL_INTRA_PCIE_ENABLE=1 and
+    # HCCL_INTRA_ROCE_ENABLE=0 can reduce cross-machine communication traffic and significantly
+    # improve communication performance.
+    a2_need_extra_args = (get_ascend_soc_version() == AscendSocVersion.A2


It may be more appropriate to add the os variable in the script.

github-actions · 2025-09-18T09:36:26Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

gemini-code-assist bot reviewed Sep 4, 2025

View reviewed changes

realliujiaxu changed the title ~~enable hierarchical communication for mc2 ops on A2~~ communication optimization for mc2 ops on A2 Sep 4, 2025

realliujiaxu changed the title ~~communication optimization for mc2 ops on A2~~ [Feat] communication optimization for mc2 ops on A2 Sep 4, 2025

github-actions bot added module:ops module:tests labels Sep 4, 2025

github-actions bot added the merge-conflicts label Sep 8, 2025

realliujiaxu force-pushed the add-expert-scales-a2 branch 2 times, most recently from 1b5b021 to 8a1c041 Compare September 9, 2025 01:27

github-actions bot removed the merge-conflicts label Sep 9, 2025

realliujiaxu force-pushed the add-expert-scales-a2 branch from d4ffa13 to 43b522d Compare September 10, 2025 06:10

momo609 reviewed Sep 12, 2025

View reviewed changes

realliujiaxu force-pushed the add-expert-scales-a2 branch from 43b522d to fd780e1 Compare September 18, 2025 09:36

github-actions bot added the merge-conflicts label Sep 18, 2025

realliujiaxu added 7 commits September 18, 2025 17:49

enable hierarchical communication for mc2 ops on A2

3605de0

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

fix lint

6f2628e

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

fix UT

4abd5f3

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

fix lint

9d75c49

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

add UT

bf301bd

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

fix typo

5716e3c

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

remove soc version code

5f92012

Signed-off-by: realliujiaxu <realliujiaxu@163.com>

realliujiaxu force-pushed the add-expert-scales-a2 branch from fd780e1 to 5f92012 Compare September 18, 2025 09:53

github-actions bot removed the merge-conflicts label Sep 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] communication optimization for mc2 ops on A2 #2752

[Feat] communication optimization for mc2 ops on A2 #2752

Uh oh!

realliujiaxu commented Sep 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 4, 2025

Uh oh!

gemini-code-assist bot Sep 4, 2025

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

codecov bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

momo609 Sep 12, 2025

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

Uh oh!

		expand_x, dynamic_scale, assist_info_for_combine, expert_token_nums, \
		ep_recv_counts, _, expand_scales = self.output[0:7]

[Feat] communication optimization for mc2 ops on A2 #2752

Are you sure you want to change the base?

[Feat] communication optimization for mc2 ops on A2 #2752

Uh oh!

Conversation

realliujiaxu commented Sep 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

codecov bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

momo609 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

Uh oh!

realliujiaxu commented Sep 4, 2025 •

edited by github-actions bot

Loading

codecov bot commented Sep 5, 2025 •

edited

Loading