Conversation
Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
This was there originally but inadvertently dropped Signed-off-by: Nick Hill <nhill@redhat.com>
…#79) * Benchmark one concurrent req Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Updates Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * restore Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Improve random requests, switch up initial test Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> --------- Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com>
Requires version of LMCache with the corresponding changes Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Nick Hill <nhill@redhat.com>
[BugFix] Fix ordering of KVConnector finished send/rcv sets
The MultiKVConnector impl keeps track of cases where multiple connectors are async saving the same request, but this state needs to be shared from the scheduler side to the worker side. Signed-off-by: Nick Hill <nhill@redhat.com>
* [BugFix] Fix handling of num_computed_tokens with connector vllm-project#18001 changed the behaviour subtly and broke some multi-connector cases. This change ensures we don't call the connector get_num_new_matched_tokens method a second time for a given request after an async load has completed. Signed-off-by: Nick Hill <nhill@redhat.com> * fix linting Signed-off-by: Nick Hill <nhill@redhat.com> * handle full cache hit on P/D decode worker case Signed-off-by: Nick Hill <nhill@redhat.com> * fix comment wording Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> --------- Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
* updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * cleanup issue Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you! |
SUMMARY: