Skip to content

When starting lmcache without using the use layerwise parameter, inference on 310p will result in an error. #10

@heijian123

Description

@heijian123

problem

When starting lmcache without using the use layerwise parameter, inference on 310p will result in an error.

(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]                                                                     ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     self.maybe_wait_for_kv_save()
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     get_kv_transfer_group().wait_for_save()
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py", line 88, in wait_for_save
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     self._lmcache_engine.wait_for_save()
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/codes/LMCache-ascend/lmcache/integration/vllm/vllm_v1_adapter.py", line 768, in wait_for_save
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     self.lmcache_engine.store(
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/codes/LMCache-ascend/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/codes/LMCache-ascend/lmcache/v1/gpu_connector.py", line 298, in batched_from_gpu
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]   File "/codes/LMCache-ascend/lmcache/v1/gpu_connector.py", line 257, in from_gpu
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522]     lmc_ops.multi_layer_kv_transfer(
(VllmWorker rank=0 pid=19500) ERROR 09-05 06:31:34 [multiproc_executor.py:522] RuntimeError: Unable to retrieve device ptr, is this a host registered pointer ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions