Skip to content

meet error ,when vllm service get a request. model :official vllm-ascend deepseek-v3-W8A8 model. #12

@StevenBrown008

Description

@StevenBrown008

Hardware: 2 node , 8* 910B per node, 64G mem per card.
CANN Toolkit: 8.2rc1
Ascend Driver: 25.2.0
PyTorch: 2.5.1, Torch-npu: 2.5.1.post1.dev20250619
vLLM: v0.9.2 & vLLM-Ascend: v0.9.2rc1

image build with this repo's dockerfile.

detail error:

ntegration.vllm.vllm_v1_adapter)
(VllmWorker rank=1 pid=1034) [2025-09-08 21:48:38,604] LMCache INFO: Storing KV cache for 6 out of 6 tokens (skip_leading_tokens=0) for request cmpl-f870c93bda2647ab91bf1cbed5859bab-0 (vllm_v1_adapter.py:709:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=0 pid=895) [2025-09-08 21:48:38,604] LMCache INFO: Storing KV cache for 6 out of 6 tokens (skip_leading_tokens=0) for request cmpl-f870c93bda2647ab91bf1cbed5859bab-0 (vllm_v1_adapter.py:709:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=3 pid=1807) [2025-09-08 21:48:38,604] LMCache INFO: Storing KV cache for 6 out of 6 tokens (skip_leading_tokens=0) for request cmpl-f870c93bda2647ab91bf1cbed5859bab-0 (vllm_v1_adapter.py:709:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522]
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522]
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.9.2) with config: model='/data/models/DeepSeek-V3-W8A8', speculative_config=None, tokenizer='/data/models/DeepSeek-V3-W8A8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=ascend, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=1024, served_model_name=deepseek_v3, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null},
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=cmpl-f870c93bda2647ab91bf1cbed5859bab-0,prompt_token_ids_len=6,mm_inputs=[],mm_hashes=[],mm_positions=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=50, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1],),num_computed_tokens=0,lora_request=None)], scheduled_cached_reqs=CachedRequestData(req_ids=[], resumed_from_preemption=[], new_token_ids=[], new_block_ids=[], num_computed_tokens=[]), num_scheduled_tokens={cmpl-f870c93bda2647ab91bf1cbed5859bab-0: 6}, total_num_scheduled_tokens=6, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[1], finished_req_ids=[], free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=LMCacheConnectorMetadata(requests=[ReqMeta(req_id='cmpl-f870c93bda2647ab91bf1cbed5859bab-0', token_ids=Tensor(shape=torch.Size([6]), device=cpu,dtype=torch.int64), slot_mapping=Tensor(shape=torch.Size([6]), device=cpu,dtype=torch.int64), is_last_prefill=true, save_spec=SaveSpec(skip_leading_tokens=0, can_save=true), load_spec=null, disagg_spec=null)], lookup_requests_in_step=['cmpl-f870c93bda2647ab91bf1cbed5859bab-0']))
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=1, num_waiting_reqs=0, kv_cache_usage=0.006134969325153339, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0), spec_decoding_stats=None, num_corrupted_reqs=0)
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] EngineCore encountered a fatal error.
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] Traceback (most recent call last):
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 579, in run_engine_core
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] engine_core.run_busy_loop()
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 938, in run_busy_loop
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] executed = self._process_engine_step()
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 631, in _process_engine_step
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] outputs, model_executed = self.step_fn()
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 235, in step
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] model_output = self.execute_model(scheduler_output)
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 221, in execute_model
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] raise err
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 212, in execute_model
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] return self.model_executor.execute_model(scheduler_output)
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 158, in execute_model
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] (output, ) = self.collective_rpc(
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] result = get_response(w, dequeue_timeout)
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] raise RuntimeError(
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] RuntimeError: Worker failed with error ''tuple' object has no attribute 'data_ptr'', please check the stack trace above for the root cause
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/engine/core.py:766: ResourceWarning: Destroying context with unclosed socket <zmq.Socket(zmq.PUSH) at 0xfffc9f9be200>
(EngineCore_0 pid=666) with ExitStack() as stack, zmq.Context() as ctx:
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/engine/core.py:766: ResourceWarning: Destroying context with unclosed socket <zmq.Socket(zmq.PUSH) at 0xfffc9f9be120>
(EngineCore_0 pid=666) with ExitStack() as stack, zmq.Context() as ctx:
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed socket <zmq.Socket(zmq.SUB) at 0xfffdead13cb0>
(EngineCore_0 pid=666) w.worker_response_mq = None
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed context <zmq.Context() at 0xfffdead27ad0>
(EngineCore_0 pid=666) w.worker_response_mq = None
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed socket <zmq.Socket(zmq.SUB) at 0xfffde92a6c10>
(EngineCore_0 pid=666) w.worker_response_mq = None
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed context <zmq.Context() at 0xfffdead279b0>
(EngineCore_0 pid=666) w.worker_response_mq = None
ERROR 09-08 21:48:38 [async_llm.py:419] AsyncLLM output_handler failed.
ERROR 09-08 21:48:38 [async_llm.py:419] Traceback (most recent call last):
ERROR 09-08 21:48:38 [async_llm.py:419] File "/workspace/vllm/vllm/v1/engine/async_llm.py", line 378, in output_handler
ERROR 09-08 21:48:38 [async_llm.py:419] outputs = await engine_core.get_output_async()
ERROR 09-08 21:48:38 [async_llm.py:419] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-08 21:48:38 [async_llm.py:419] File "/workspace/vllm/vllm/v1/engine/core_client.py", line 740, in get_output_async
ERROR 09-08 21:48:38 [async_llm.py:419] raise self._format_exception(outputs) from None
ERROR 09-08 21:48:38 [async_llm.py:419] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 09-08 21:48:38 [async_llm.py:345] Request cmpl-f870c93bda2647ab91bf1cbed5859bab-0 failed (engine dead).
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed socket <zmq.Socket(zmq.SUB) at 0xfffde91ff070>
(EngineCore_0 pid=666) w.worker_response_mq = None
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed context <zmq.Context() at 0xfffdead27830>
(EngineCore_0 pid=666) w.worker_response_mq = None
INFO: 127.0.0.1:22498 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions