-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[Misc] Rev DeepEP #27122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Rev DeepEP #27122
Conversation
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the DeepEP dependency to a newer commit, enabling support for a hidden size of 3072. The changes are straightforward, involving an update to the commit hash in the installation script and adding the new size to the supported sizes list. My review includes a performance-related suggestion to use a frozenset
for the list of supported hidden sizes, which is more efficient for membership checks in a low-latency context.
# DeepEP low-latency kernels are compiled only for certain | ||
# specific hidden sizes. | ||
SUPPORTED_HIDDEN_SIZES = [2048, 2560, 4096, 5120, 6144, 7168] | ||
SUPPORTED_HIDDEN_SIZES = [2048, 2560, 3072, 4096, 5120, 6144, 7168] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For improved performance, consider using a frozenset
for SUPPORTED_HIDDEN_SIZES
. Membership testing (in
) is O(1) on average for sets, while it is O(n) for lists. Given that this check is in a performance-sensitive path for low-latency kernels, this optimization is beneficial. A frozenset
is suitable here as it's an immutable constant.
SUPPORTED_HIDDEN_SIZES = [2048, 2560, 3072, 4096, 5120, 6144, 7168] | |
SUPPORTED_HIDDEN_SIZES = frozenset([2048, 2560, 3072, 4096, 5120, 6144, 7168]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :) Please fix the conflict and we can ready
f034ced
to
1c11c4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work!
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Head branch was pushed to by a user without write access
1c11c4b
to
0f15d4f
Compare
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Purpose
Update DeepEP commit to DeepEP main (73b6ea4) to pick up additional hidden-size support that is useful for gpt-oss low-latency DP/EP
Test Plan
Run unit test
test_deepep_moe.py
locallye2e:
Server :
VLLM_ALL2ALL_BACKEND="deepep_high_throughput" VLLM_USE_DEEP_GEMM=1 canhazgpu run -g2 -- vllm serve Qwen/Qwen3-30B-A3B-FP8 --trust-remote-code --enable-expert-parallel --data-parallel-size 2 -no-enable-prefix-caching
Server:
VLLM_ALL2ALL_BACKEND="deepep_low_latency" VLLM_USE_DEEP_GEMM=1 canhazgpu run -g2 -- vllm serve Qwen/Qwen3-30B-A3B-FP8 --trust-remote-code --enable-expert-parallel --data-parallel-size 2 --port 9010 --no-enable-prefix-caching
lm_eval:
lm_eval --model local-completions --tasks gsm8k --model_args model=Qwen/Qwen3-30B-A3B-FP8,base_url=http://127.0.0.1:9010/v1/completions,num_concurrent=30,max_retries=3 --limit 100
Test Result