[0.9.1][PD] Added support for delay-free blocks in prefill nodes #1691

underfituu · 2025-07-09T07:04:19Z

What this PR does / why we need it?

PD Logic Analysis:
In the current implementation, the P-node immediately releases memory blocks after completing inference. Under high concurrency scenarios, if the P-node's inference speed significantly outpaces the D-node's block pulling operations, this leads to memory block contention/corruption issues.

Current Solution:
The D-node sends acknowledgment messages to the worker connector in the P-node's driver worker after completing data reception. The P-node maintains a counter to track these acknowledgments - memory blocks are only released after receiving confirmations from all D-node worker connectors involved in the KV cache transfer.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: underfituu <hzhucong@163.com>

ganyi1996ppo · 2025-07-14T01:30:14Z

vllm_ascend/distributed/llmdatadist_c_mgr_connector.py

                else:
                    raise RuntimeError(
                        f"LLMDataDistCMgrConnectorWorker: Receiving unexpected request event {event_msg} from remote !"
                    )

+    def _increment_task_count(self, request_id: str, tp_rank: int,
+                              decode_tp_size: int):
+        if tp_rank in self.done_receiving_counts[request_id]:


Better check if the request_id is already inside the self.done_receiving_counts for safety

where did you add your request_id ? seems I can't find it in this diff

Thanks! Added explicit request_id existence check and early initialization for safety.

Signed-off-by: underfituu <hzhucong@163.com>

underfituu added 3 commits July 9, 2025 15:01

add_kv_transfer_finished_status

6a13035

Signed-off-by: underfituu <hzhucong@163.com>

add_kv_transfer_finished_status fix

a2cefcf

Signed-off-by: underfituu <hzhucong@163.com>

add_kv_transfer_finished_status fix remote port

a8a3954

Signed-off-by: underfituu <hzhucong@163.com>

underfituu force-pushed the add_kv_transfer_finished_status branch from 77c41bd to a8a3954 Compare July 11, 2025 01:42

underfituu added 2 commits July 11, 2025 10:19

fix lint

6ed4060

Signed-off-by: underfituu <hzhucong@163.com>

fix lint

2d2fc36

Signed-off-by: underfituu <hzhucong@163.com>

underfituu force-pushed the add_kv_transfer_finished_status branch from 259ff3c to 2d2fc36 Compare July 11, 2025 03:25

underfituu changed the title ~~[0.9.1][PD] Support delay free blocks~~ [0.9.1][PD] Added support for delay-free blocks in prefill nodes Jul 11, 2025

underfituu added 3 commits July 11, 2025 17:58

fix send multipart

5037d99

Signed-off-by: underfituu <hzhucong@163.com>

fix lint

d8b5cc7

Signed-off-by: underfituu <hzhucong@163.com>

fix receving

53795c3

Signed-off-by: underfituu <hzhucong@163.com>

ganyi1996ppo reviewed Jul 14, 2025

View reviewed changes

check safety if request_id already inside done_receiving_counts

cb421b9

Signed-off-by: underfituu <hzhucong@163.com>

ganyi1996ppo approved these changes Jul 14, 2025

View reviewed changes

ganyi1996ppo merged commit 6b188ed into vllm-project:v0.9.1-dev Jul 14, 2025
16 checks passed

wangxiyuan added the no-main label Jul 14, 2025

Yikun added the no-test label Jul 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0.9.1][PD] Added support for delay-free blocks in prefill nodes #1691

[0.9.1][PD] Added support for delay-free blocks in prefill nodes #1691

Uh oh!

underfituu commented Jul 9, 2025 •

edited

Loading

Uh oh!

ganyi1996ppo Jul 14, 2025

Uh oh!

ganyi1996ppo Jul 14, 2025

Uh oh!

underfituu Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

[0.9.1][PD] Added support for delay-free blocks in prefill nodes #1691

[0.9.1][PD] Added support for delay-free blocks in prefill nodes #1691

Uh oh!

Conversation

underfituu commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ganyi1996ppo Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

underfituu Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

underfituu commented Jul 9, 2025 •

edited

Loading