-
Notifications
You must be signed in to change notification settings - Fork 269
[Dist][EP] Remove ETP/EP maintained in vllm-ascend #1681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1681 +/- ##
==========================================
- Coverage 54.18% 53.41% -0.77%
==========================================
Files 74 72 -2
Lines 9235 9053 -182
==========================================
- Hits 5004 4836 -168
+ Misses 4231 4217 -14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0e03620
to
5a32cb8
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
@@ -129,6 +129,7 @@ def _pangu_torchair_test_fixture( | |||
distributed_executor_backend="mp", | |||
enforce_eager=False, | |||
additional_config=additional_config, | |||
enable_expert_parallel=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must set enable_expert_parallel
to True
when using pangu with the ep in vllm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this until disabling ep is supported in pangu. plz help review this, thanks! cc @Angazenn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html
Now the example is updated, thanks!
@@ -129,6 +129,7 @@ def _pangu_torchair_test_fixture( | |||
distributed_executor_backend="mp", | |||
enforce_eager=False, | |||
additional_config=additional_config, | |||
enable_expert_parallel=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe the related doc should be updated as well. For example https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_moge.html
@jianzs @ttanzhiqiang could you help review this pr? |
tp_rank = get_tp_group().rank_in_group | ||
tp_size = get_tp_group().world_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to use tp group here? cause this is the weight loader for linear layer
|
Thanks for this info, but I'm not formalir with the different fused moe state, maybe I need to read more code to understand the above 1, 2 and 3.
I think we still have ep=1 when ep is disabled, you can refer to https://github.com/vllm-project/vllm/blob/235bfd5dfe0975e42b115cfb910e73eff5c670d8/vllm/model_executor/layers/fused_moe/config.py#L274-L281 |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
@ttanzhiqiang maybe we should remove the best example on A2 in #1101, WDYT? |
I think you can remove these scripts, as we only need to maintain them internally on our side. |
OK, I want to remove it mainly because it is a best example on ETP, which is removed here. I'll remove it then |
ae9f97d
to
cf0c584
Compare
Signed-off-by: MengqingCao <cmq0113@163.com>
Hi @jianzs @ttanzhiqiang @ApsarasX , your suggestions are addressed now, could you take a look again? thanks! |
Let's merge this first. Before next release, we should do a deep test about TP and EP |
Do you have a timeline for the next release? Also, are there any temporary solutions for the bug described in #1396? |
What this PR does / why we need it?
Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced
This is a part of #1422 backport.
Fixes #1396 #1154
Does this PR introduce any user-facing change?
We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead.
How was this patch tested?
CI passed with new added and existing test.