-
Notifications
You must be signed in to change notification settings - Fork 281
[WIP][Prefill Performance] Parallel Strategy Optimizations (VRAM-for-Speed Tradeoff) #1687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.9.1-dev
Are you sure you want to change the base?
[WIP][Prefill Performance] Parallel Strategy Optimizations (VRAM-for-Speed Tradeoff) #1687
Conversation
SlightwindSec
commented
Jul 9, 2025
- Optimized MoE Expert All2All: Rewritten for better throughput with increased memory cost.
- Shared Expert Sharding Strategy Update: Switched from TP-aligned to pure DP for shared experts, enabling more efficient execution.
- O_Proj AllReduce → ReduceScatter: Reduced communication overhead by using ReduceScatter, made possible by pure DP sharding.
- AllGather Postponed: Delayed to after QKV down projection to reduce synchronization impact during prefill.
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
e4f0f20
to
92994fb
Compare
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
This pull request can't be merged. It's better to submit these features as separate pull requests. @Yikun @wangxiyuan @ganyi1996ppo |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
This pull request has conflicts, please resolve those before we can evaluate the pull request. |