Skip to content

[0.9.1][PD] Added support for delay-free blocks in prefill nodes #1691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

underfituu
Copy link
Contributor

@underfituu underfituu commented Jul 9, 2025

What this PR does / why we need it?

PD Logic Analysis:
In the current implementation, the P-node immediately releases memory blocks after completing inference. Under high concurrency scenarios, if the P-node's inference speed significantly outpaces the D-node's block pulling operations, this leads to memory block contention/corruption issues.

Current Solution:
The D-node sends acknowledgment messages to the worker connector in the P-node's driver worker after completing data reception. The P-node maintains a counter to track these acknowledgments - memory blocks are only released after receiving confirmations from all D-node worker connectors involved in the KV cache transfer.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
@underfituu underfituu force-pushed the add_kv_transfer_finished_status branch from 77c41bd to a8a3954 Compare July 11, 2025 01:42
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
@underfituu underfituu force-pushed the add_kv_transfer_finished_status branch from 259ff3c to 2d2fc36 Compare July 11, 2025 03:25
@underfituu underfituu changed the title [0.9.1][PD] Support delay free blocks [0.9.1][PD] Added support for delay-free blocks in prefill nodes Jul 11, 2025
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
else:
raise RuntimeError(
f"LLMDataDistCMgrConnectorWorker: Receiving unexpected request event {event_msg} from remote !"
)

def _increment_task_count(self, request_id: str, tp_rank: int,
decode_tp_size: int):
if tp_rank in self.done_receiving_counts[request_id]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better check if the request_id is already inside the self.done_receiving_counts for safety

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did you add your request_id ? seems I can't find it in this diff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added explicit request_id existence check and early initialization for safety.

Signed-off-by: underfituu <hzhucong@163.com>
@ganyi1996ppo ganyi1996ppo merged commit 6b188ed into vllm-project:v0.9.1-dev Jul 14, 2025
16 checks passed
@Yikun Yikun added the no-test label Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants