[WIP][BugFix]Fix accuracy issues caused by wrong etp_size passed into FusedMoEParallelConfig when using vLLM 0.9.0 #961

Angazenn · 2025-05-26T11:33:04Z

What this PR does / why we need it?

This PR fix accuracy issues incurred by codes that adapt to FusedMoEParallelConfig in vLLM 0.9.0 version. The tp_size used to split weights are wrongly passed. The root cause is that vLLM community and vLLM-Ascend are using different methods to decide whether to use Expert Parallel.

vLLM:
vLLM use a flag enable_expert_parallel to indicate whether to use EP and use the following codes to decide ep_size:

        use_ep = (dp_size_ * tp_size_ > 1
                  and vllm_parallel_config.enable_expert_parallel)

        dp_size = dp_size_
        dp_rank = get_dp_group().rank_in_group if dp_size > 1 else 0
        tp_size, tp_rank = flatten_tp_across_dp(dp_rank)

        if not use_ep:
            return FusedMoEParallelConfig(tp_size=tp_size,
                                          tp_rank=tp_rank,
                                          dp_size=dp_size,
                                          dp_rank=dp_rank,
                                          ep_size=1,
                                          ep_rank=0,
                                          use_ep=False)
        # DP + EP / TP + EP / DP + TP + EP
        assert use_ep
        # In EP, each device owns a set of experts fully. There is no tensor
        # parallel update tp_size, tp_rank, ep_size and ep_rank to reflect that.
        ep_size = tp_size
        ep_rank = tp_rank
        return FusedMoEParallelConfig(tp_size=1,
                                      tp_rank=0,
                                      dp_size=dp_size,
                                      dp_rank=dp_rank,
                                      ep_size=ep_size,
                                      ep_rank=ep_rank,
                                      use_ep=True)

vLLM-Ascend:
vLLM-Ascend uses etp to specify Tensor Parallel in MoE.

            self.ep_size = get_ep_group().world_size
            self.tp_size = get_etp_group().world_size
            self.dp_size = (dp_size if dp_size is not None else
                            get_dp_group().world_size)

So there will be conflicts if we simply combine these codes together.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: angazenn <zengyanjia@huawei.com>

… FusedMoEParallelConfig when using vLLM 0.9.0 (vllm-project#961)  ### What this PR does / why we need it? This PR fix accuracy issues incurred by codes that adapt to `FusedMoEParallelConfig` in vLLM 0.9.0 version. The `tp_size` used to split weights are wrongly passed. The root cause is that vLLM community and vLLM-Ascend are using different methods to decide whether to use Expert Parallel. vLLM: vLLM use a flag `enable_expert_parallel` to indicate whether to use EP and use the following codes to decide `ep_size`: ``` use_ep = (dp_size_ * tp_size_ > 1 and vllm_parallel_config.enable_expert_parallel) dp_size = dp_size_ dp_rank = get_dp_group().rank_in_group if dp_size > 1 else 0 tp_size, tp_rank = flatten_tp_across_dp(dp_rank) if not use_ep: return FusedMoEParallelConfig(tp_size=tp_size, tp_rank=tp_rank, dp_size=dp_size, dp_rank=dp_rank, ep_size=1, ep_rank=0, use_ep=False) # DP + EP / TP + EP / DP + TP + EP assert use_ep # In EP, each device owns a set of experts fully. There is no tensor # parallel update tp_size, tp_rank, ep_size and ep_rank to reflect that. ep_size = tp_size ep_rank = tp_rank return FusedMoEParallelConfig(tp_size=1, tp_rank=0, dp_size=dp_size, dp_rank=dp_rank, ep_size=ep_size, ep_rank=ep_rank, use_ep=True) ``` vLLM-Ascend: vLLM-Ascend uses `etp` to specify Tensor Parallel in MoE. ``` self.ep_size = get_ep_group().world_size self.tp_size = get_etp_group().world_size self.dp_size = (dp_size if dp_size is not None else get_dp_group().world_size) ``` So there will be conflicts if we simply combine these codes together. ### Does this PR introduce _any_ user-facing change?  ### How was this patch tested?  Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com>

github-actions bot added the module:ops label May 26, 2025

fix etp_size

4337c52

Signed-off-by: angazenn <zengyanjia@huawei.com>

Angazenn force-pushed the fix_acc branch from 5b770f5 to 4337c52 Compare May 26, 2025 11:34

ganyi1996ppo approved these changes May 27, 2025

View reviewed changes

ganyi1996ppo merged commit 9f5ab59 into vllm-project:main May 27, 2025
22 checks passed

MengqingCao mentioned this pull request May 28, 2025

[bugfix] Add ep initialization check and change the return check to is_driver_worker #896

Merged

Angazenn deleted the fix_acc branch September 8, 2025 03:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][BugFix]Fix accuracy issues caused by wrong etp_size passed into FusedMoEParallelConfig when using vLLM 0.9.0 #961

[WIP][BugFix]Fix accuracy issues caused by wrong etp_size passed into FusedMoEParallelConfig when using vLLM 0.9.0 #961

Uh oh!

Angazenn commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

[WIP][BugFix]Fix accuracy issues caused by wrong etp_size passed into FusedMoEParallelConfig when using vLLM 0.9.0 #961

[WIP][BugFix]Fix accuracy issues caused by wrong etp_size passed into FusedMoEParallelConfig when using vLLM 0.9.0 #961

Uh oh!

Conversation

Angazenn commented May 26, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Uh oh!