Skip to content

Conversation

LCAIZJ
Copy link
Contributor

@LCAIZJ LCAIZJ commented Sep 13, 2025

What this PR does / why we need it?

Added a new connector for Mooncake store integration to enable kvcache reuse in scenarios with system prompts or multi-turn dialogues.

Does this PR introduce any user-facing change?

How was this patch tested?

Co-authored-by: fems14 <1804143737@qq.com>
Co-authored-by: Dreamerleader <2270923832@qq.com>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
Co-authored-by: lizy124 <1950471827@qq.com>
Co-authored-by: zouyida2052 <zouyida2002@gmail.com>
Signed-off-by: LCAIZJ <leichao139636@163.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new KV cache connector, MooncakeConnectorV1, which seems to integrate with a distributed KV store called "kvpool" or "mooncake". The changes are extensive, adding several new modules for configuration, data transfer, and the core engine logic for this connector. The review identifies several critical runtime errors (NameError, AttributeError) due to undefined variables or incorrect assumptions about data structures, which would prevent the feature from working correctly. There are also issues with incorrect logic in key parsing and a hardcoded timeout that could impact production stability. These critical issues need to be addressed to ensure the functionality and robustness of the new connector.

assert attn_metadata.num_decodes is not None and \
attn_metadata.num_prefills is not None and \
attn_metadata.num_decode_tokens is not None
self.wait_for_kv_layer_from_connector(layer.layer_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable layer is not defined in this scope, which will cause a NameError at runtime. The layer name is required here. A possible solution is to store kv_sharing_target_layer_name (which is passed to __init__) as an instance attribute, e.g., self.layer_name, and use it here. This same issue exists on line 1054.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code removed

is_force_scatter=self.enable_shared_expert_dp)[0]
current_ms_metadata.after_comm_event.record()
del o_proj_input
self.maybe_save_kv_layer_to_connector(layer_name=layer.layer_name, kv_cache_layer=kv_cache)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable layer is not defined in this scope, which will cause a NameError at runtime. As mentioned for line 983, the layer name should be retrieved from an instance attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code removed

Comment on lines 75 to 76
if len(parts) != 5:
raise ValueError(f"Invalid key string: {s}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The length check for parts is incorrect. The to_string method for MooncakeEngineKey produces a string with 4 parts separated by '@'. Therefore, the check should be len(parts) != 4. With the current check, this function will always raise a ValueError for valid keys.

Suggested change
if len(parts) != 5:
raise ValueError(f"Invalid key string: {s}")
if len(parts) != 4:
raise ValueError(f"Invalid key string: {s}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code removed

for start, end, key in self.token_database.process_tokens(tokens, mask):
addr, size, _ =self.prepare_value(start, end, block_ids)
self.m_store.put(key, addr, size)
if is_last_chunk:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable is_last_chunk is not defined in this method's scope, which will cause a NameError. It should be accessed from the req_meta dictionary, like req_meta["is_last_chunk"].

Suggested change
if is_last_chunk:
if req_meta["is_last_chunk"]:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

key_str = key.to_string()
try:
res = self.store.batch_get_into_ascend(key_str, addr, size)
if res[0] != expect_res:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable expect_res is not defined, which will cause a NameError. Based on the put method, it seems the expected result for a successful operation is 0. Please define expect_res or replace it with the correct value.

Suggested change
if res[0] != expect_res:
if res[0] != 0:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified


request_ids = [req.req_id for req in scheduler_output.scheduled_new_reqs]
for request_id, (request,block_ids) in self._unfinished_requests.items():
if not request_id in request_ids and not request_id in cached_reqs.req_ids:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Accessing cached_reqs.req_ids will raise an AttributeError when cached_reqs is a list. This happens when use_v1_spec=False in vLLM's configuration. You should handle both cases where cached_reqs is a list of objects or a dataclass with a req_ids attribute. A robust way would be to create a set of request IDs from cached_reqs before this loop.

For example:

cached_req_ids_set = set()
if isinstance(cached_reqs, list):
    cached_req_ids_set = {req.req_id for req in cached_reqs}
else:
    cached_req_ids_set = set(cached_reqs.req_ids)

# ... later in the loop ...
if request_id not in request_ids and request_id not in cached_req_ids_set:
    # ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignored

keys = [list(row) for row in zip(*keys, strict=False)] # [num_layer,block_num]
for layer_id, keys_multi_chunk in enumerate(keys):
if not first_flag:
is_finish=self.get_event.wait(timeout=3) #try---cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a hardcoded timeout of 3 seconds might not be robust for production environments, where operations could take longer under high load. This could lead to premature SystemError exceptions. It's recommended to make this timeout configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

@LCAIZJ LCAIZJ changed the title [Feat] A Connector that supports kvpool [Feat] A Connector that supports Mooncake store Sep 13, 2025
@LCAIZJ LCAIZJ changed the title [Feat] A Connector that supports Mooncake store [Feat] A connector that supports Mooncake store Sep 13, 2025
@LCAIZJ LCAIZJ changed the title [Feat] A connector that supports Mooncake store [Feat] A Connector that supports Mooncake store Sep 13, 2025
Signed-off-by: fems14 <1804143737@qq.com>
Signed-off-by: fems14 <1804143737@qq.com>
Signed-off-by: fems14 <1804143737@qq.com>
Signed-off-by: fems14 <1804143737@qq.com>
Signed-off-by: fems14 <1804143737@qq.com>
Signed-off-by: fems14 <1804143737@qq.com>
Signed-off-by: fems14 <1804143737@qq.com>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 17, 2025
Signed-off-by: fems14 <1804143737@qq.com>
@@ -0,0 +1,264 @@
# MultiConnector + Mooncake Basic Scenario Verification & Pooling and Mixed Deployment Scenario Verification
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mooncacke store deployment guide

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

* Python >= 3.9, < 3.12
* CANN >= 8.2.rc1
* PyTorch >= 2.7.1, torch-npu >= 2.7.1.dev20250724
* vLLM:Mainline branch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

* PyTorch >= 2.7.1, torch-npu >= 2.7.1.dev20250724
* vLLM:Mainline branch
* vLLM-Ascend:Mainline branch
* Mooncake:[AscendTransport/Mooncake at pooling-async-memcpy](https://github.com/AscendTransport/Mooncake/tree/pooling-async-memcpy)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double check the origin link is correct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

* vLLM:Mainline branch
* vLLM-Ascend:Mainline branch
* Mooncake:[AscendTransport/Mooncake at pooling-async-memcpy](https://github.com/AscendTransport/Mooncake/tree/pooling-async-memcpy)
* mooncake-transfer-engine reference documentation: https://github.com/kvcache-ai/Mooncake/blob/main/doc/zh/ascend_transport.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

mooncake_master --port 50088
```

## multiConnector + mooncake basic scenario
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefill decode disaggregate scenario

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

* CANN >= 8.2.rc1
* PyTorch >= 2.7.1, torch-npu >= 2.7.1.dev20250724
* vLLM:Mainline branch
* vLLM-Ascend:Mainline branch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified

Signed-off-by: fems14 <1804143737@qq.com>
@wangxiyuan wangxiyuan merged commit cef43b5 into vllm-project:main Sep 18, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready read for review ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants