-
Notifications
You must be signed in to change notification settings - Fork 447
Bugfix_091dev_OOM #2910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.9.1-dev
Are you sure you want to change the base?
Bugfix_091dev_OOM #2910
Conversation
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix an out-of-memory error on decoder nodes by reducing the chunked_prefill_workspace_size
. The change correctly identifies the consumer node and applies a smaller limit. However, the implementation for the consumer node is unnecessarily complex. I've provided a suggestion to simplify the logic, which makes the code more readable and maintainable while achieving the same result.
if vllm_config.kv_transfer_config.is_kv_consumer: | ||
max_chunked_size = scheduler_config.max_num_seqs * self.block_size | ||
else: | ||
max_chunked_size = 128 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const variables should be defined at the front of file with bigger case, and don't forget to note it also.
self.chunked_prefill_enabled = scheduler_config.chunked_prefill_enabled | ||
if self.chunked_prefill_enabled: | ||
if vllm_config.kv_transfer_config.is_kv_consumer: | ||
max_chunked_size = scheduler_config.max_num_seqs * self.block_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the OOM caused by the batch size being greater than 1024?
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
What this PR does / why we need it?
Resolving the issue of the chunked work size being too large, which leads to D node out-of-memory (OOM) errors.
Does this PR introduce any user-facing change?
no
How was this patch tested?
by ci