-
Notifications
You must be signed in to change notification settings - Fork 257
Support pipeline parallel in V1 Engine #1700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1700 +/- ##
===========================================
+ Coverage 27.39% 54.42% +27.02%
===========================================
Files 56 80 +24
Lines 6191 10007 +3816
===========================================
+ Hits 1696 5446 +3750
- Misses 4495 4561 +66
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
overall lgtm, please add an e2e test for pp, thanks! |
yes, the e2e test should be added at least. A simple one with qwen3-0.6b + PP=2 in multicard is welcome |
@pytest.mark.parametrize("pp_size", PIPELINE_PARALLELS) | ||
def test_models(model: str, tp_size: int, pp_size: int) -> None: | ||
# Create an LLM. | ||
llm = LLM( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommand to use tests.conftest.VllmRunner
, as we don't need to do the resource clear by hand
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
What this PR does / why we need it?
This patch supports pipeline parallel in V1 Engine
Does this PR introduce any user-facing change?
Yes, users can run PP in V1
How was this patch tested?
Manully test