-
Notifications
You must be signed in to change notification settings - Fork 466
[Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig #2434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig #2434
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a validation check in AscendSchedulerConfig
to prevent configurations where max_num_batched_tokens
is less than max_model_len
when chunked prefill is disabled. This is a valuable correctness fix that avoids unexpected rejection of long sequences. The implementation is sound, and the associated test updates and additions are thorough, covering both valid and invalid scenarios. The changes improve the robustness and user-friendliness of the scheduler configuration.
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
3bc18ce
to
f697b20
Compare
…cendSchedulerConfig Signed-off-by: linfeng-yuan <1102311262@qq.com>
f697b20
to
3ed37b2
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2434 +/- ##
==========================================
+ Coverage 77.37% 77.39% +0.01%
==========================================
Files 128 128
Lines 16455 16468 +13
==========================================
+ Hits 12732 12745 +13
Misses 3723 3723
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
please update the commit message. Thanks |
…ndSchedulerConfig (vllm-project#2434) ### What this PR does / why we need it? Add configuration check logic for ascend scheduler: if chunked_prefill is disabled, max_num_batched_tokens couldn't be less than max_model_len, following vLLM; ### Does this PR introduce _any_ user-facing change? users cannot set max_num_batched_tokens smaller than max_model_len with ascend scheduler ### How was this patch tested? CI and vllm serving passed - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@f77a080 Signed-off-by: linfeng-yuan <1102311262@qq.com>
…ndSchedulerConfig (vllm-project#2434) ### What this PR does / why we need it? Add configuration check logic for ascend scheduler: if chunked_prefill is disabled, max_num_batched_tokens couldn't be less than max_model_len, following vLLM; ### Does this PR introduce _any_ user-facing change? users cannot set max_num_batched_tokens smaller than max_model_len with ascend scheduler ### How was this patch tested? CI and vllm serving passed - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@f77a080 Signed-off-by: linfeng-yuan <1102311262@qq.com>
What this PR does / why we need it?
Add configuration check logic for ascend scheduler: if chunked_prefill is disabled, max_num_batched_tokens couldn't be less than max_model_len, following vLLM;
Does this PR introduce any user-facing change?
users cannot set max_num_batched_tokens smaller than max_model_len with ascend scheduler
How was this patch tested?
CI and vllm serving passed