Skip to content

[Dist][EP] Remove ETP/EP maintained in vllm-ascend #1681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 21, 2025

Conversation

MengqingCao
Copy link
Collaborator

@MengqingCao MengqingCao commented Jul 9, 2025

What this PR does / why we need it?

Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced

This is a part of #1422 backport.

Fixes #1396 #1154

Does this PR introduce any user-facing change?

We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead.

How was this patch tested?

CI passed with new added and existing test.

Copy link

codecov bot commented Jul 9, 2025

Codecov Report

Attention: Patch coverage is 36.36364% with 7 lines in your changes missing coverage. Please review.

Project coverage is 53.41%. Comparing base (f9dfde0) to head (fa7a1f0).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/ops/fused_moe.py 12.50% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1681      +/-   ##
==========================================
- Coverage   54.18%   53.41%   -0.77%     
==========================================
  Files          74       72       -2     
  Lines        9235     9053     -182     
==========================================
- Hits         5004     4836     -168     
+ Misses       4231     4217      -14     
Flag Coverage Δ
unittests 53.41% <36.36%> (-0.77%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MengqingCao MengqingCao force-pushed the etp branch 2 times, most recently from 0e03620 to 5a32cb8 Compare July 10, 2025 03:09
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@@ -129,6 +129,7 @@ def _pangu_torchair_test_fixture(
distributed_executor_backend="mp",
enforce_eager=False,
additional_config=additional_config,
enable_expert_parallel=True,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must set enable_expert_parallel to True when using pangu with the ep in vllm

Copy link
Collaborator Author

@MengqingCao MengqingCao Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this until disabling ep is supported in pangu. plz help review this, thanks! cc @Angazenn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html

Now the example is updated, thanks!

@@ -129,6 +129,7 @@ def _pangu_torchair_test_fixture(
distributed_executor_backend="mp",
enforce_eager=False,
additional_config=additional_config,
enable_expert_parallel=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html

@MengqingCao
Copy link
Collaborator Author

@jianzs @ttanzhiqiang could you help review this pr?

Comment on lines +146 to +147
tp_rank = get_tp_group().rank_in_group
tp_size = get_tp_group().world_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to use tp group here? cause this is the weight loader for linear layer

@ttanzhiqiang
Copy link
Contributor

@MengqingCao

  1. If Prefill/decode uses AllGather or NaiveMulticast solution at the same time, this is ETP logic.
  2. If Prefill/decode uses All2All/MC2 solution at the same time, this is EP logic.
  3. Prefill uses AllGatherEP solution (using VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP switch), and Decode uses MC2 solution, this is AllGatherEP logic. .
  4. In the PD separation scenario, the strategies used by P and D are separate.
    After this pr, will the get_fused_moe_state function still have ep=1?

@MengqingCao
Copy link
Collaborator Author

@MengqingCao

  1. If Prefill/decode uses AllGather or NaiveMulticast solution at the same time, this is ETP logic.
  2. If Prefill/decode uses All2All/MC2 solution at the same time, this is EP logic.
  3. Prefill uses AllGatherEP solution (using VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP switch), and Decode uses MC2 solution, this is AllGatherEP logic. .

Thanks for this info, but I'm not formalir with the different fused moe state, maybe I need to read more code to understand the above 1, 2 and 3.

  1. In the PD separation scenario, the strategies used by P and D are separate.
    After this pr, will the get_fused_moe_state function still have ep=1?

I think we still have ep=1 when ep is disabled, you can refer to https://github.com/vllm-project/vllm/blob/235bfd5dfe0975e42b115cfb910e73eff5c670d8/vllm/model_executor/layers/fused_moe/config.py#L274-L281

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@MengqingCao
Copy link
Collaborator Author

@ttanzhiqiang maybe we should remove the best example on A2 in #1101, WDYT?

@ApsarasX
Copy link
Collaborator

@ttanzhiqiang maybe we should remove the best example on A2 in #1101, WDYT?

I think you can remove these scripts, as we only need to maintain them internally on our side.

@MengqingCao
Copy link
Collaborator Author

@ttanzhiqiang maybe we should remove the best example on A2 in #1101, WDYT?

I think you can remove these scripts, as we only need to maintain them internally on our side.

OK, I want to remove it mainly because it is a best example on ETP, which is removed here. I'll remove it then

@MengqingCao MengqingCao force-pushed the etp branch 2 times, most recently from ae9f97d to cf0c584 Compare July 17, 2025 06:29
Signed-off-by: MengqingCao <cmq0113@163.com>
@MengqingCao
Copy link
Collaborator Author

Hi @jianzs @ttanzhiqiang @ApsarasX , your suggestions are addressed now, could you take a look again? thanks!

@wangxiyuan wangxiyuan merged commit 8cfd257 into vllm-project:main Jul 21, 2025
24 checks passed
@wangxiyuan
Copy link
Collaborator

Let's merge this first. Before next release, we should do a deep test about TP and EP

@MengqingCao MengqingCao deleted the etp branch July 21, 2025 01:50
@MingXiangL
Copy link

Let's merge this first. Before next release, we should do a deep test about TP and EP

Do you have a timeline for the next release? Also, are there any temporary solutions for the bug described in #1396?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: assert self.cpu_group is not None
6 participants