Fixes for dp + ep + tp combinations#78
Fixes for dp + ep + tp combinations#78varun-sundar-rabindranath wants to merge 171 commits intomodular-fused-expertsfrom
Conversation
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
|
||
|
|
||
| @dataclass | ||
| class FusedMoEParallelConfig: |
There was a problem hiding this comment.
move tp / dp / ep computation here and out of FusedMoe .
| # With EP and the pplx kernels - this is no longer viable | ||
| # as all GPU ranks in DP, produce the complete set of hidden_states. | ||
| # Therefore reduce the shared experts early. | ||
| reduce_results=self.experts.must_reduce_shared_outputs(), |
There was a problem hiding this comment.
reduce_results must be True when using pplx dispatch combine.
| max_num_tokens, | ||
| world_size, | ||
| dp_size, | ||
| rank, |
There was a problem hiding this comment.
cosmetics - rearrange args
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
5d960df to
b04e5d3
Compare
705da89 to
1f91cfd
Compare
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you! |
vllm server command :
vllm serve ${model}
--enforce-eager
--trust-remote-code
--tensor-parallel-size ${tp_size}
--data-parallel-size ${dp_size}
${EP_ARGS}
--no-enable-prefix-caching
--port ${server_port}
lm_eval command :
lm_eval --model local-completions
--tasks gsm8k
--model_args model=deepseek-ai/DeepSeek-V2-Lite,base_url=http://127.0.0.1:${SERVER_PORT}/v1/completions,num_concurrent=5,max_retries=3,tokenized_requests=False
--limit 100
verified correctness using lm-eval for the following combinations:
## Works only with VLLM_MLA_DISABLE=1