-
Notifications
You must be signed in to change notification settings - Fork 459
[P/D][BugFix]Mooncake timeout release bug fix #2899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a race condition in the Mooncake connector's timeout release mechanism during PD separation, which could lead to a request ID being released twice. The fix introduces an intermediate state, record_finished_requests
, to correctly synchronize the two paths that can trigger a release. The logic appears to correctly solve the described race condition. My review includes one comment highlighting a potential performance issue in the implementation that should be addressed.
with self.done_task_lock: | ||
self.finished_requests.add(request_id) | ||
self._remove_delayed_requests(request_id) | ||
if any(item[0] == request_id for item in self.delayed_free_requests): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check any(item[0] == request_id for item in self.delayed_free_requests)
performs a linear scan over the delayed_free_requests
deque. This is an O(N) operation, where N is the number of delayed requests. Since this code is executed under a lock (self.done_task_lock
) and can be on a hot path, this linear scan could become a performance bottleneck if the number of delayed requests grows large.
To improve performance, I recommend maintaining a companion set
of request IDs that are present in self.delayed_free_requests
. This would change the check to an O(1) operation.
For example, you could introduce self.delayed_free_request_ids: set[str] = set()
in KVCacheTaskTracker
and update it whenever self.delayed_free_requests
is modified. Then, this line could be changed to if request_id in self.delayed_free_request_ids:
. This would require changes in other methods that modify delayed_free_requests
, such as add_delayed_request
, _retrieve_expired_requests
, and _remove_delayed_requests
.
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
4fd45f9
to
842f231
Compare
What this PR does / why we need it?
In the P node timeout release mechanism during PD separation, the req_id that requires timeout release is transmitted from the scheduler to the worker. If the KV cache between PDs is transferred too quickly, the P node's req_id may be released twice. The first release is when the D node notifies the P node that the KV cache has been pulled, and the second release is when the scheduler transmits the timeout release to the worker.
To address this bug, an intermediate component is introduced to manage the release of req_ids.
Pull kv and forward2 may occur one after the other in timing. The previous timeout defaulted to forward2 being before pull_kv.

Does this PR introduce any user-facing change?
How was this patch tested?