-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hardware: 2 node , 8* 910B per node, 64G mem per card.
CANN Toolkit: 8.2rc1
Ascend Driver: 25.2.0
PyTorch: 2.5.1, Torch-npu: 2.5.1.post1.dev20250619
vLLM: v0.9.2 & vLLM-Ascend: v0.9.2rc1
image build with this repo's dockerfile.
detail error:
ntegration.vllm.vllm_v1_adapter)
(VllmWorker rank=1 pid=1034) [2025-09-08 21:48:38,604] LMCache INFO: Storing KV cache for 6 out of 6 tokens (skip_leading_tokens=0) for request cmpl-f870c93bda2647ab91bf1cbed5859bab-0 (vllm_v1_adapter.py:709:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=0 pid=895) [2025-09-08 21:48:38,604] LMCache INFO: Storing KV cache for 6 out of 6 tokens (skip_leading_tokens=0) for request cmpl-f870c93bda2647ab91bf1cbed5859bab-0 (vllm_v1_adapter.py:709:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=3 pid=1807) [2025-09-08 21:48:38,604] LMCache INFO: Storing KV cache for 6 out of 6 tokens (skip_leading_tokens=0) for request cmpl-f870c93bda2647ab91bf1cbed5859bab-0 (vllm_v1_adapter.py:709:lmcache.integration.vllm.vllm_v1_adapter)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1421, in execute_model
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] num_scheduled_tokens_np, finished_sending, finished_recving) = (self._process_reqs(
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1147, in _process_reqs
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.maybe_wait_for_kv_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1633, in maybe_wait_for_kv_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] get_kv_transfer_group().wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_connector_v1.py", line 99, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self._lmcache_engine.wait_for_save()
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/integration/vllm/vllm_v1_adapter.py", line 731, in wait_for_save
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.lmcache_engine.store(
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/cache_engine.py", line 237, in store
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=2 pid=1205) ERROR 09-08 21:48:38 [multiproc_executor.py:522]
(VllmWorker rank=3 pid=1807) ERROR 09-08 21:48:38 [multiproc_executor.py:522]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.gpu_connector.batched_from_gpu(memory_objs, starts, ends, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 293, in batched_from_gpu
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.from_gpu(memory_obj, start, end, **kwargs)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache/lmcache/v1/gpu_connector.py", line 248, in from_gpu
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] kv_cache_pointers = self._initialize_pointers(self.kvcaches)
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in _initialize_pointers
(VllmWorker rank=1 pid=1034) ERROR 09-08 21:48:38 [multiproc_executor.py:522]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] File "/workspace/LMCache-Ascend/lmcache_ascend/v1/npu_connector.py", line 22, in
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] self.kv_cache_pointers.numpy()[:] = [t.data_ptr() for t in kv_caches]
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] ^^^^^^^^^^
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522] AttributeError: 'tuple' object has no attribute 'data_ptr'
(VllmWorker rank=0 pid=895) ERROR 09-08 21:48:38 [multiproc_executor.py:522]
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.9.2) with config: model='/data/models/DeepSeek-V3-W8A8', speculative_config=None, tokenizer='/data/models/DeepSeek-V3-W8A8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=ascend, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=1024, served_model_name=deepseek_v3, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null},
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=cmpl-f870c93bda2647ab91bf1cbed5859bab-0,prompt_token_ids_len=6,mm_inputs=[],mm_hashes=[],mm_positions=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=50, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1],),num_computed_tokens=0,lora_request=None)], scheduled_cached_reqs=CachedRequestData(req_ids=[], resumed_from_preemption=[], new_token_ids=[], new_block_ids=[], num_computed_tokens=[]), num_scheduled_tokens={cmpl-f870c93bda2647ab91bf1cbed5859bab-0: 6}, total_num_scheduled_tokens=6, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[1], finished_req_ids=[], free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=LMCacheConnectorMetadata(requests=[ReqMeta(req_id='cmpl-f870c93bda2647ab91bf1cbed5859bab-0', token_ids=Tensor(shape=torch.Size([6]), device=cpu,dtype=torch.int64), slot_mapping=Tensor(shape=torch.Size([6]), device=cpu,dtype=torch.int64), is_last_prefill=true, save_spec=SaveSpec(skip_leading_tokens=0, can_save=true), load_spec=null, disagg_spec=null)], lookup_requests_in_step=['cmpl-f870c93bda2647ab91bf1cbed5859bab-0']))
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=1, num_waiting_reqs=0, kv_cache_usage=0.006134969325153339, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0), spec_decoding_stats=None, num_corrupted_reqs=0)
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] EngineCore encountered a fatal error.
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] Traceback (most recent call last):
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 579, in run_engine_core
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] engine_core.run_busy_loop()
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 938, in run_busy_loop
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] executed = self._process_engine_step()
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 631, in _process_engine_step
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] outputs, model_executed = self.step_fn()
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 235, in step
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] model_output = self.execute_model(scheduler_output)
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 221, in execute_model
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] raise err
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/engine/core.py", line 212, in execute_model
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] return self.model_executor.execute_model(scheduler_output)
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 158, in execute_model
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] (output, ) = self.collective_rpc(
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] result = get_response(w, dequeue_timeout)
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] raise RuntimeError(
(EngineCore_0 pid=666) ERROR 09-08 21:48:38 [core.py:588] RuntimeError: Worker failed with error ''tuple' object has no attribute 'data_ptr'', please check the stack trace above for the root cause
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/engine/core.py:766: ResourceWarning: Destroying context with unclosed socket <zmq.Socket(zmq.PUSH) at 0xfffc9f9be200>
(EngineCore_0 pid=666) with ExitStack() as stack, zmq.Context() as ctx:
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/engine/core.py:766: ResourceWarning: Destroying context with unclosed socket <zmq.Socket(zmq.PUSH) at 0xfffc9f9be120>
(EngineCore_0 pid=666) with ExitStack() as stack, zmq.Context() as ctx:
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed socket <zmq.Socket(zmq.SUB) at 0xfffdead13cb0>
(EngineCore_0 pid=666) w.worker_response_mq = None
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed context <zmq.Context() at 0xfffdead27ad0>
(EngineCore_0 pid=666) w.worker_response_mq = None
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed socket <zmq.Socket(zmq.SUB) at 0xfffde92a6c10>
(EngineCore_0 pid=666) w.worker_response_mq = None
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed context <zmq.Context() at 0xfffdead279b0>
(EngineCore_0 pid=666) w.worker_response_mq = None
ERROR 09-08 21:48:38 [async_llm.py:419] AsyncLLM output_handler failed.
ERROR 09-08 21:48:38 [async_llm.py:419] Traceback (most recent call last):
ERROR 09-08 21:48:38 [async_llm.py:419] File "/workspace/vllm/vllm/v1/engine/async_llm.py", line 378, in output_handler
ERROR 09-08 21:48:38 [async_llm.py:419] outputs = await engine_core.get_output_async()
ERROR 09-08 21:48:38 [async_llm.py:419] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-08 21:48:38 [async_llm.py:419] File "/workspace/vllm/vllm/v1/engine/core_client.py", line 740, in get_output_async
ERROR 09-08 21:48:38 [async_llm.py:419] raise self._format_exception(outputs) from None
ERROR 09-08 21:48:38 [async_llm.py:419] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 09-08 21:48:38 [async_llm.py:345] Request cmpl-f870c93bda2647ab91bf1cbed5859bab-0 failed (engine dead).
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed socket <zmq.Socket(zmq.SUB) at 0xfffde91ff070>
(EngineCore_0 pid=666) w.worker_response_mq = None
(EngineCore_0 pid=666) /workspace/vllm/vllm/v1/executor/multiproc_executor.py:263: ResourceWarning: Unclosed context <zmq.Context() at 0xfffdead27830>
(EngineCore_0 pid=666) w.worker_response_mq = None
INFO: 127.0.0.1:22498 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error