-
Notifications
You must be signed in to change notification settings - Fork 456
[BugFix] Async scheduling and PP compatibility with DP #2796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jesse <szxfml@gmail.com>
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix issues related to asynchronous scheduling and pipeline parallelism (PP) compatibility with data parallelism (DP). The changes in vllm_ascend/worker/worker_v1.py
introduce a check to prevent deadlocks when no tokens are scheduled and refactor the model execution logic for intermediate pipeline ranks. While the refactoring improves clarity, it introduces a critical type error where a function returns a value that violates its type hint, potentially causing runtime failures. My review focuses on correcting this type violation to ensure the fix is robust.
Signed-off-by: jesse <szxfml@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2796 +/- ##
==========================================
+ Coverage 74.76% 75.25% +0.48%
==========================================
Files 150 154 +4
Lines 20891 21413 +522
==========================================
+ Hits 15620 16115 +495
- Misses 5271 5298 +27
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3392700
to
fcf6a9a
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
c14a251
to
629641c
Compare
629641c
to
b1b3fc3
Compare
Signed-off-by: jesse <szxfml@gmail.com>
b1b3fc3
to
4c92b58
Compare
Signed-off-by: jesse <szxfml@gmail.com>
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: jesse <szxfml@gmail.com>
What this PR does / why we need it?
based on the vllm-project/vllm#23770,
fix Async scheduling and PP compatibility with DP, also fixes issue with finished requests not being processed in async scheduling and PP cases, and possible worker race conditions.
Does this PR introduce any user-facing change?
How was this patch tested?