-
Notifications
You must be signed in to change notification settings - Fork 454
[Feature] Support moe multi-stream for aclgraph. #2946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces multi-streaming for Mixture-of-Experts (MoE) models on Ascend NPUs to enable overlapping computation of shared experts and routing experts, which is a good performance optimization. The implementation logic for stream management appears correct. My review focuses on improving the robustness of the newly added utility functions npu_stream_switch
and npu_wait_stream
. By adding checks for None
streams, these functions become safer and more reliable for future use across the codebase, preventing potential AttributeError
exceptions and unexpected behavior.
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
dd021b2
to
9f9474d
Compare
d4f5afd
to
4a63e34
Compare
|
This will be done in PR #2681 |
071fc3e
to
2fcc91b
Compare
2fcc91b
to
0fbf909
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
0fbf909
to
0c489d1
Compare
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
0c489d1
to
4618e67
Compare
This PR puts the calculation of shared experts into a separate stream, overlaping with routing experts.