Skip to content

Conversation

@pkuzyc
Copy link
Contributor

@pkuzyc pkuzyc commented Aug 25, 2025

PR types

New features

PR changes

Models

Description

Support sequence parallel in deepseek-v3 model

deepllz and others added 3 commits August 19, 2025 17:22
* update expert parallel init logic

* fix flash_mask && MoEFlexTokenLayer experts && add some config

* offload optimizer

---------

Co-authored-by: blacksheep-Aristotle <zhangweilong01@baidu.com>
Co-authored-by: deepllz <you@example.com>
@paddle-bot
Copy link

paddle-bot bot commented Aug 25, 2025

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants