Skip to content

Conversation

wangxiaoteng888
Copy link

What this PR does / why we need it?

Resolving the issue of the chunked work size being too large, which leads to D node out-of-memory (OOM) errors.

Does this PR introduce any user-facing change?

no

How was this patch tested?

by ci

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an out-of-memory error on decoder nodes by reducing the chunked_prefill_workspace_size. The change correctly identifies the consumer node and applies a smaller limit. However, the implementation for the consumer node is unnecessarily complex. I've provided a suggestion to simplify the logic, which makes the code more readable and maintainable while achieving the same result.

if vllm_config.kv_transfer_config.is_kv_consumer:
max_chunked_size = scheduler_config.max_num_seqs * self.block_size
else:
max_chunked_size = 128 * 1024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const variables should be defined at the front of file with bigger case, and don't forget to note it also.

self.chunked_prefill_enabled = scheduler_config.chunked_prefill_enabled
if self.chunked_prefill_enabled:
if vllm_config.kv_transfer_config.is_kv_consumer:
max_chunked_size = scheduler_config.max_num_seqs * self.block_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the OOM caused by the batch size being greater than 1024?

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants