-
Notifications
You must be signed in to change notification settings - Fork 3
Fixes for dp + ep + tp combinations #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: modular-fused-experts
Are you sure you want to change the base?
Fixes for dp + ep + tp combinations #78
Conversation
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
@@ -50,6 +50,112 @@ | |||
MOE_DP_CHUNK_SIZE = 256 | |||
|
|||
|
|||
@dataclass | |||
class FusedMoEParallelConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move tp / dp / ep computation here and out of FusedMoe .
# With EP and the pplx kernels - this is no longer viable | ||
# as all GPU ranks in DP, produce the complete set of hidden_states. | ||
# Therefore reduce the shared experts early. | ||
reduce_results=self.experts.must_reduce_shared_outputs(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce_results
must be True when using pplx dispatch combine.
@@ -325,8 +325,9 @@ def pplx_dispatch_combine(pgi, dp_size, a, topk_weight, topk_ids, num_experts): | |||
ata, | |||
max_num_tokens, | |||
world_size, | |||
dp_size, | |||
rank, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cosmetics - rearrange args
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
5d960df
to
b04e5d3
Compare
f5bcc22
to
5ba84d2
Compare
vllm server command :
vllm serve ${model}
--enforce-eager
--trust-remote-code
--tensor-parallel-size ${tp_size}
--data-parallel-size ${dp_size}
${EP_ARGS}
--no-enable-prefix-caching
--port ${server_port}
lm_eval command :
lm_eval --model local-completions
--tasks gsm8k
--model_args model=deepseek-ai/DeepSeek-V2-Lite,base_url=http://127.0.0.1:${SERVER_PORT}/v1/completions,num_concurrent=5,max_retries=3,tokenized_requests=False
--limit 100
verified correctness using lm-eval for the following combinations:
## Works only with VLLM_MLA_DISABLE=1