Fixes for dp + ep + tp combinations by varun-sundar-rabindranath · Pull Request #78 · neuralmagic/vllm

varun-sundar-rabindranath · 2025-05-06T17:21:59Z

vllm server command :
vllm serve ${model}
--enforce-eager
--trust-remote-code
--tensor-parallel-size ${tp_size}
--data-parallel-size ${dp_size}
${EP_ARGS}
--no-enable-prefix-caching
--port ${server_port}

lm_eval command :
lm_eval --model local-completions
--tasks gsm8k
--model_args model=deepseek-ai/DeepSeek-V2-Lite,base_url=http://127.0.0.1:${SERVER_PORT}/v1/completions,num_concurrent=5,max_retries=3,tokenized_requests=False
--limit 100

verified correctness using lm-eval for the following combinations:

DP=1 TP=1 EP=False
DP=1 TP=2 EP=False
DP=2 TP=1 EP=False
DP=2 TP=2 EP=False
DP=2 TP=1 EP=True
DP=2 TP=2 EP=True ~~## Works only with VLLM_MLA_DISABLE=1~~

Signed-off-by: Bill Nell <bnell@redhat.com>

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Signed-off-by: Bill Nell <bnell@redhat.com>

Signed-off-by: Bill Nell <bnell@redhat.com>

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

github-actions · 2025-05-06T17:22:11Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

varun-sundar-rabindranath · 2025-05-06T17:27:06Z

vllm/model_executor/layers/fused_moe/layer.py



+@dataclass
+class FusedMoEParallelConfig:


move tp / dp / ep computation here and out of FusedMoe .

varun-sundar-rabindranath · 2025-05-06T17:29:33Z

vllm/model_executor/models/deepseek_v2.py

+                # With EP and the pplx kernels - this is no longer viable
+                # as all GPU ranks in DP, produce the complete set of hidden_states.
+                # Therefore reduce the shared experts early.
+                reduce_results=self.experts.must_reduce_shared_outputs(),


reduce_results must be True when using pplx dispatch combine.

varun-sundar-rabindranath · 2025-05-06T17:34:47Z

tests/kernels/moe/test_pplx_moe.py

        max_num_tokens,
        world_size,
-        dp_size,
        rank,


cosmetics - rearrange args

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

github-actions · 2025-08-13T02:11:57Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

github-actions · 2025-09-13T01:57:17Z

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

bnellnm added 30 commits April 30, 2025 16:53

moe refactoring

9a6ee6b

Signed-off-by: Bill Nell <bnell@redhat.com>

module deepgemm moe working

82188dd

Signed-off-by: Bill Nell <bnell@redhat.com>

working deep gemm, wip cutlass

fea4fbf

Signed-off-by: Bill Nell <bnell@redhat.com>

working cutlass

d78f1c8

Signed-off-by: Bill Nell <bnell@redhat.com>

deepgemm working again

8e3e6a9

Signed-off-by: Bill Nell <bnell@redhat.com>

cutlass working again

e41e4bf

Signed-off-by: Bill Nell <bnell@redhat.com>

cutlass working again

0b877ba

Signed-off-by: Bill Nell <bnell@redhat.com>

fix inplace, format and name cleanups

5d76ee9

Signed-off-by: Bill Nell <bnell@redhat.com>

fix inplace, format + name cleanups

c2ce01a

Signed-off-by: Bill Nell <bnell@redhat.com>

test improvements

b52b50d

Signed-off-by: Bill Nell <bnell@redhat.com>

make modular triton classes, fix edge cases

49a9d11

Signed-off-by: Bill Nell <bnell@redhat.com>

fix outplace bug

bcac19a

Signed-off-by: Bill Nell <bnell@redhat.com>

refactor dispatch/combine stuff

e2ab4f5

Signed-off-by: Bill Nell <bnell@redhat.com>

initial pplx dispatch/combine class

ecaca4e

Signed-off-by: Bill Nell <bnell@redhat.com>

merge triton dispatch into standard, add some comments

9bcbde0

Signed-off-by: Bill Nell <bnell@redhat.com>

format

73847e0

Signed-off-by: Bill Nell <bnell@redhat.com>

comments

b136032

Signed-off-by: Bill Nell <bnell@redhat.com>

fix linter

62584bf

Signed-off-by: Bill Nell <bnell@redhat.com>

fix more linter stuff

cbdc471

Signed-off-by: Bill Nell <bnell@redhat.com>

cleanup for review

8e2c5b2

Signed-off-by: Bill Nell <bnell@redhat.com>

review comments

cef98ab

Signed-off-by: Bill Nell <bnell@redhat.com>

forgot return

13da7ea

Signed-off-by: Bill Nell <bnell@redhat.com>

add dp_rank_num_tokens to DPMetadata

fb39d50

Signed-off-by: Bill Nell <bnell@redhat.com>

better check for fp8 in _fp8_permute

9bac87a

Signed-off-by: Bill Nell <bnell@redhat.com>

updates

9882f97

Signed-off-by: Bill Nell <bnell@redhat.com>

fix merge issues

cfcdb70

Signed-off-by: Bill Nell <bnell@redhat.com>

fix lint

4664e0f

Signed-off-by: Bill Nell <bnell@redhat.com>

add pplx tests

42f12d7

Signed-off-by: Bill Nell <bnell@redhat.com>

lint

dc0a640

Signed-off-by: Bill Nell <bnell@redhat.com>

undo random lint changes

64acde9

Signed-off-by: Bill Nell <bnell@redhat.com>

Varun Sundar Rabindranath and others added 9 commits April 30, 2025 21:35

fix forward_chunked

22b988a

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Signed-off-by: Bill Nell <bnell@redhat.com>

fix more lint

c09cefd

Signed-off-by: Bill Nell <bnell@redhat.com>

cleanups

938c516

Signed-off-by: Bill Nell <bnell@redhat.com>

cleanups + lint, layer.py wip

c0fc027

Signed-off-by: Bill Nell <bnell@redhat.com>

fix parallel_state lint

f74ab61

Signed-off-by: Bill Nell <bnell@redhat.com>

fix M=1 pplx test

3e8a0e3

Signed-off-by: Bill Nell <bnell@redhat.com>

fix M=1 pplx test

886045e

Signed-off-by: Bill Nell <bnell@redhat.com>

fix M=1 pplx test

5d960df

Signed-off-by: Bill Nell <bnell@redhat.com>

fixes

1014679

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

varun-sundar-rabindranath requested a review from tlrmchlsmth as a code owner May 6, 2025 17:21

varun-sundar-rabindranath commented May 6, 2025

View reviewed changes

zero out attn outputs during profile run

ba8f478

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

varun-sundar-rabindranath requested review from alexm-redhat, njhill and robertgshaw2-redhat as code owners May 7, 2025 06:31

bnellnm force-pushed the modular-fused-experts branch from 5d960df to b04e5d3 Compare May 7, 2025 15:24

bnellnm requested a review from mgoin as a May 7, 2025 15:24

bnellnm force-pushed the modular-fused-experts branch 6 times, most recently from 705da89 to 1f91cfd Compare May 14, 2025 15:35

github-actions bot added the stale label Aug 13, 2025

github-actions bot closed this Sep 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for dp + ep + tp combinations#78

Fixes for dp + ep + tp combinations#78
varun-sundar-rabindranath wants to merge 171 commits intomodular-fused-expertsfrom
varun/moe-fixes

varun-sundar-rabindranath commented May 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

varun-sundar-rabindranath May 6, 2025

Uh oh!

varun-sundar-rabindranath May 6, 2025

Uh oh!

varun-sundar-rabindranath May 6, 2025

Uh oh!

github-actions bot commented Aug 13, 2025

Uh oh!

github-actions bot commented Sep 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants



		@dataclass
		class FusedMoEParallelConfig:

Conversation

varun-sundar-rabindranath commented May 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

varun-sundar-rabindranath May 6, 2025

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath May 6, 2025

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath May 6, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 13, 2025

Uh oh!

github-actions bot commented Sep 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

varun-sundar-rabindranath commented May 6, 2025 •

edited by github-actions bot

Loading