Skip to content

[WIP][Prefill Performance] Parallel Strategy Optimizations (VRAM-for-Speed Tradeoff) #1687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: v0.9.1-dev
Choose a base branch
from

Conversation

SlightwindSec
Copy link

  1. Optimized MoE Expert All2All: Rewritten for better throughput with increased memory cost.
  2. Shared Expert Sharding Strategy Update: Switched from TP-aligned to pure DP for shared experts, enabling more efficient execution.
  3. O_Proj AllReduce → ReduceScatter: Reduced communication overhead by using ReduceScatter, made possible by pure DP sharding.
  4. AllGather Postponed: Delayed to after QKV down projection to reduce synchronization impact during prefill.

SlightwindSec and others added 3 commits July 9, 2025 10:51
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Copy link

github-actions bot commented Jul 9, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

SlightwindSec and others added 4 commits July 10, 2025 22:10
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
angazenn and others added 2 commits July 14, 2025 20:54
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
SlightwindSec and others added 3 commits July 15, 2025 14:47
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
@Angazenn Angazenn force-pushed the upstream_v0.9.1-dev_dev0 branch from e4f0f20 to 92994fb Compare July 17, 2025 01:50
angazenn and others added 3 commits July 17, 2025 10:05
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

angazenn added 3 commits July 17, 2025 18:30
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
@jianzs
Copy link
Collaborator

jianzs commented Jul 23, 2025

This pull request can't be merged. It's better to submit these features as separate pull requests. @Yikun @wangxiyuan @ganyi1996ppo

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

 into upstream_v0.9.1-dev_dev0

# Conflicts:
#	vllm_ascend/ops/fused_moe.py
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants