-
Notifications
You must be signed in to change notification settings - Fork 461
migrate P2pNcclConnector to P2pHcclConnector #2686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: luolun <luolun1995@cmbchina.com>
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request migrates the P2pNcclConnector to a P2pHcclConnector for Ascend NPUs, enabling disaggregated prefill. The changes are extensive, introducing a new KVConnector, its engine, and HCCL wrappers. While the overall structure is sound, I've identified several critical issues, many of which appear to be copy-paste errors from the original NCCL implementation. These include incorrect API usage for HCCL streams, wrong exception types for error handling, and logging errors that could lead to runtime crashes. Addressing these issues is crucial for the stability and correctness of the feature on Ascend hardware.
vllm_ascend/distributed/kv_transfer/kv_connector/v1/p2p/p2p_hccl_engine.py
Outdated
Show resolved
Hide resolved
vllm_ascend/distributed/kv_transfer/kv_connector/v1/p2p/p2p_hccl_engine.py
Outdated
Show resolved
Hide resolved
vllm_ascend/distributed/kv_transfer/kv_connector/v1/p2p/p2p_hccl_connector.py
Outdated
Show resolved
Hide resolved
vllm_ascend/distributed/kv_transfer/kv_connector/v1/p2p/p2p_hccl_connector.py
Outdated
Show resolved
Hide resolved
def get_finished( | ||
self, finished_req_ids: set[str], no_compile_layers | ||
) -> tuple[Optional[set[str]], Optional[set[str]]]: | ||
""" | ||
Notifies worker-side connector ids of requests that have | ||
finished generating tokens. | ||
Returns: | ||
ids of requests that have finished asynchronous transfer, | ||
tuple of (sending/saving ids, recving/loading ids). | ||
The finished saves/sends req ids must belong to a set provided in a | ||
call to this method (this call or a prior one). | ||
""" | ||
|
||
# Clear the buffer upon request completion. | ||
for request_id in finished_req_ids: | ||
for layer_name in no_compile_layers: | ||
tensor_id = request_id + "#" + layer_name | ||
if tensor_id in self.recv_store: | ||
with self.recv_store_cv: | ||
tensor = self.recv_store.pop(tensor_id, None) | ||
self.send_request_id_to_tensor_ids.pop( | ||
request_id, None) | ||
self.recv_request_id_to_tensor_ids.pop( | ||
request_id, None) | ||
if isinstance(tensor, tuple): | ||
addr, _, _ = tensor | ||
self.pool.free(addr) | ||
|
||
# TODO:Retrieve requests that have already sent the KV cache. | ||
finished_sending: set[str] = set() | ||
|
||
# TODO:Retrieve requests that have already received the KV cache. | ||
finished_recving: set[str] = set() | ||
|
||
return finished_sending or None, finished_recving or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_finished
method has an incomplete implementation, with TODO
comments indicating that the logic for retrieving finished sending and receiving requests is missing. It currently returns (None, None)
, which might prevent the scheduler from correctly freeing resources for completed requests. This could lead to resource leaks or deadlocks. Please complete the implementation to correctly track and return the sets of finished request IDs.
Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
7e0fadd
to
e1fc641
Compare
Signed-off-by: luolun <luolun1995@cmbchina.com>
2f0c150
to
0d5daa3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good job !
vllm_ascend/__init__.py
Outdated
register_model() | ||
|
||
|
||
def register_kvconnector(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add a new plugins, I think just register the connector in https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/distributed/__init__.py
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the module is too complex, vllm_ascend/distributed/kv_connector
is OK IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
AttentionLayer, AttentionType) | ||
from vllm.attention.backends.utils import CommonAttentionState | ||
from vllm.config import VllmConfig | ||
from vllm.distributed.kv_transfer import (get_kv_transfer_group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main branch now support this change. please rebase to main.
scheduler_output (SchedulerOutput): the scheduler output object. | ||
""" | ||
|
||
meta = P2pNcclConnectorMetadata() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so here we reuse nccl metadata totally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be reused at the moment
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Directly register connector instead of using plugin. Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
b913b5d
to
82a3644
Compare
Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
Directly register connector instead of using plugin. Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
2. add P2pHcclConnectorMetadata 3. fix connector register Signed-off-by: luolun <luolun1995@cmbchina.com>
ReqMeta.make_meta(request_id, token_ids, block_ids, block_size)) | ||
|
||
|
||
class P2pHcclConnector(KVConnectorBase_V1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we inherit from the upstream class?
What this PR does / why we need it?
Related issue: #2594
The PR migrate the P2pNcclConnector to vllm-ascend project. It supports running disaggregated prefill in the similar way to P2pNcclConnector in vllm.
The only dependency for P2pHcclConnector is hccl, context and stream api in cann. Cann may possibly be installed when using with ascend npu. Therefore, it can be used with fewer pre-dependencies. This is helpful when:
Does this PR introduce any user-facing change?
Yes, this PR introduces a new type of KVConnector(P2pHcclConnector) that can be enabled via configuration.
How was this patch tested?
cann version: 8.2.RC1
test case: