Skip to content

[Bug]: Qwen3-235B cannot be run successfully with vllm v1 engine on version 0.8.5rc1 #781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
BestKuan opened this issue May 7, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@BestKuan
Copy link

BestKuan commented May 7, 2025

Your current environment

The output of `python collect_env.py`
INFO 05-07 08:36:01 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 05-07 08:36:01 [importing.py:28] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernelcompilation.
INFO 05-07 08:36:01 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 05-07 08:36:02 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 05-07 08:36:02 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 05-07 08:36:02 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 05-07 08:36:02 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 05-07 08:36:02 [__init__.py:44] plugin ascend loaded.
INFO 05-07 08:36:02 [__init__.py:230] Platform plugin ascend is activated
WARNING 05-07 08:36:04 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.0
Libc version: glibc-2.35

Python version: 3.10.17 (main, Apr 30 2025, 16:00:31) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.10.0-136.12.0.88.4.ctl3.aarch64-aarch64-with-glibc2.35

CPU:
Architecture:                    aarch64
CPU op-mode(s):                  64-bit
Byte Order:                      Little Endian
CPU(s):                          192
On-line CPU(s) list:             0-191
Vendor ID:                       HiSilicon
BIOS Vendor ID:                  HiSilicon
Model name:                      Kunpeng-920
BIOS Model name:                 HUAWEI Kunpeng 920 5250
Model:                           0
Thread(s) per core:              1
Core(s) per socket:              48
Socket(s):                       4
Stepping:                        0x1
BogoMIPS:                        200.00
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache:                       12 MiB (192 instances)
L1i cache:                       12 MiB (192 instances)
L2 cache:                        96 MiB (192 instances)
L3 cache:                        192 MiB (8 instances)
NUMA node(s):                    4
NUMA node0 CPU(s):               0-47
NUMA node1 CPU(s):               48-95
NUMA node2 CPU(s):               96-143
NUMA node3 CPU(s):               144-191
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.4.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.51.3
[conda] Could not collect
vLLM Version: 0.8.5.post1
vLLM Ascend Version: 0.8.5rc1

ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
VLLM_USE_V1=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1


NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2.1               Version: 24.1.rc2.1                                           |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B3               | OK            | 90.9        32                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          3368 / 65536         |
+===========================+===============+====================================================+
| 1     910B3               | OK            | 89.2        29                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          3369 / 65536         |
+===========================+===============+====================================================+
| 2     910B3               | OK            | 90.7        30                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          3369 / 65536         |
+===========================+===============+====================================================+
| 3     910B3               | OK            | 95.4        30                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          3369 / 65536         |
+===========================+===============+====================================================+
| 4     910B3               | OK            | 90.8        37                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          3369 / 65536         |
+===========================+===============+====================================================+
| 5     910B3               | OK            | 88.4        34                0    / 0             |
| 0                         | 0000:02:00.0  | 0           0    / 0          3369 / 65536         |
+===========================+===============+====================================================+
| 6     910B3               | OK            | 95.8        35                0    / 0             |
| 0                         | 0000:41:00.0  | 0           0    / 0          3365 / 65536         |
+===========================+===============+====================================================+
| 7     910B3               | OK            | 92.0        36                0    / 0             |
| 0                         | 0000:42:00.0  | 0           0    / 0          3368 / 65536         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 0                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 1                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 2                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 3                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 4                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 5                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 6                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 7                                                            |
+===========================+===============+====================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux

🐛 Describe the bug

When I run Qwen3-235B-A22B on 2 nodes with 16x910B3,model weight can be loaded ok. But when I send a request, the vllm server crashed. This is the log below.

ERROR 05-07 08:23:58 [core.py:398] EngineCore encountered a fatal error.
ERROR 05-07 08:23:58 [core.py:398] Traceback (most recent call last):
ERROR 05-07 08:23:58 [core.py:398]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 389, in run_engine_core
ERROR 05-07 08:23:58 [core.py:398]     engine_core.run_busy_loop()
ERROR 05-07 08:23:58 [core.py:398]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 413, in run_busy_loop
ERROR 05-07 08:23:58 [core.py:398]     self._process_engine_step()
ERROR 05-07 08:23:58 [core.py:398]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 438, in _process_engine_step
ERROR 05-07 08:23:58 [core.py:398]     outputs = self.step_fn()
ERROR 05-07 08:23:58 [core.py:398]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 203, in step
ERROR 05-07 08:23:58 [core.py:398]     output = self.model_executor.execute_model(scheduler_output)
ERROR 05-07 08:23:58 [core.py:398]   File "/vllm-workspace/vllm/vllm/v1/executor/ray_distributed_executor.py", line 57, in execute_model
ERROR 05-07 08:23:58 [core.py:398]     return refs[0].get()
ERROR 05-07 08:23:58 [core.py:398]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/experimental/compiled_dag_ref.py", line 150, in get
ERROR 05-07 08:23:58 [core.py:398]     return _process_return_vals(return_vals, True)
ERROR 05-07 08:23:58 [core.py:398]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/experimental/compiled_dag_ref.py", line 27, in _process_return_vals
ERROR 05-07 08:23:58 [core.py:398]     raise val.as_instanceof_cause()
ERROR 05-07 08:23:58 [core.py:398] ray.exceptions.RayTaskError(ValueError): ray::RayWorkerWrapper.__ray_call__() (pid=18591, ip=172.19.0.28)
ERROR 05-07 08:23:58 [core.py:398]   File "/vllm-workspace/vllm/vllm/executor/ray_utils.py", line 130, in execute_model_ray
ERROR 05-07 08:23:58 [core.py:398]     self.setup_device_if_necessary()
ERROR 05-07 08:23:58 [core.py:398]   File "/vllm-workspace/vllm/vllm/executor/ray_utils.py", line 117, in setup_device_if_necessary
ERROR 05-07 08:23:58 [core.py:398]     torch.cuda.set_device(self.worker.device)
ERROR 05-07 08:23:58 [core.py:398]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/cuda/__init__.py", line 476, in set_device
ERROR 05-07 08:23:58 [core.py:398]     device = _get_device_index(device)
ERROR 05-07 08:23:58 [core.py:398]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/cuda/_utils.py", line 34, in _get_device_index
ERROR 05-07 08:23:58 [core.py:398]     raise ValueError(f"Expected a cuda device, but got: {device}")
ERROR 05-07 08:23:58 [core.py:398] ValueError: Expected a cuda device, but got: npu:0
INFO 05-07 08:23:58 [ray_distributed_executor.py:127] Shutting down Ray distributed executor. If you see error log from logging.cc regarding SIGTERM received, please ignore because this is the expected termination process in Ray.
2025-05-07 08:23:58,217 INFO compiled_dag_node.py:2173 -- Tearing down compiled DAG
ERROR 05-07 08:23:58 [async_llm.py:399] AsyncLLM output_handler failed.
ERROR 05-07 08:23:58 [async_llm.py:399] Traceback (most recent call last):
ERROR 05-07 08:23:58 [async_llm.py:399]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 357, in output_handler
ERROR 05-07 08:23:58 [async_llm.py:399]     outputs = await engine_core.get_output_async()
ERROR 05-07 08:23:58 [async_llm.py:399]   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 716, in get_output_async
ERROR 05-07 08:23:58 [async_llm.py:399]     raise self._format_exception(outputs) from None
ERROR 05-07 08:23:58 [async_llm.py:399] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 05-07 08:23:58 [async_llm.py:324] Request cmpl-5a2affcafa984ca3aaf71d064ee59067-0 failed (engine dead).
INFO:     127.0.0.1:57468 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 0d858f0749acd09331c7018001000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, a3570d045e221611145cd3f901000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, baae8c108247476f262c33bb01000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, ed5458ab9f87cc254621707201000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, cf4bf04aacbdeb2f50f5e61701000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, ba1c9a30620c9d316ae3941a01000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, fcf4d090a77e9838b4f08d2901000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 7db16050cdec7392f87ed58701000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 4b6fc5501e61fdf10b1cd55401000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 2e296490115d90d8ab17234601000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 0993a140aa7bd9cebdf5f81101000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, a4d6a013dd726f3ac1047ae101000000)
2025-05-07 08:23:58,227 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 89cad0caf7a239f3293ea30501000000)
2025-05-07 08:23:58,227 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, c19224d49013fc24e0de0ee201000000)
2025-05-07 08:23:58,227 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 2585f6f9494e325736ae915301000000)
2025-05-07 08:23:58,227 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 3e4e31644b452faf13558dc801000000)
INFO:     Shutting down
2025-05-07 08:23:58,267 INFO compiled_dag_node.py:2200 -- Waiting for worker tasks to exit
2025-05-07 08:23:58,269 INFO compiled_dag_node.py:2203 -- Teardown complete
Process EngineCore_0:
Traceback (most recent call last):
  File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 400, in run_engine_core
    raise e
  File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 389, in run_engine_core
    engine_core.run_busy_loop()
  File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 413, in run_busy_loop
    self._process_engine_step()
  File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 438, in _process_engine_step
    outputs = self.step_fn()
  File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 203, in step
    output = self.model_executor.execute_model(scheduler_output)
  File "/vllm-workspace/vllm/vllm/v1/executor/ray_distributed_executor.py", line 57, in execute_model
    return refs[0].get()
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/experimental/compiled_dag_ref.py", line 150, in get
    return _process_return_vals(return_vals, True)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/experimental/compiled_dag_ref.py", line 27, in _process_return_vals
    raise val.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::RayWorkerWrapper.__ray_call__() (pid=18591, ip=172.19.0.28)
  File "/vllm-workspace/vllm/vllm/executor/ray_utils.py", line 130, in execute_model_ray
    self.setup_device_if_necessary()
  File "/vllm-workspace/vllm/vllm/executor/ray_utils.py", line 117, in setup_device_if_necessary
    torch.cuda.set_device(self.worker.device)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/cuda/__init__.py", line 476, in set_device
    device = _get_device_index(device)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/cuda/_utils.py", line 34, in _get_device_index
    raise ValueError(f"Expected a cuda device, but got: {device}")
ValueError: Expected a cuda device, but got: npu:0
(raylet) [2025-05-07 08:23:58,299 C 18100 18100] (raylet) experimental_mutable_object_provider.cc:156:  Check failed: object_manager_->WriteAcquire(info.local_object_id, total_data_size, nullptr, total_metadata_size, info.num_readers, object_backing_store) Status not OK: ChannelError: Channel closed.
(raylet) *** StackTrace Information ***
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0xd18eb8) [0xaaaab3428eb8] ray::operator<<()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0xd1b7e8) [0xaaaab342b7e8] ray::RayLog::~RayLog()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x456bb0) [0xaaaab2b66bb0] ray::core::experimental::MutableObjectProvider::HandlePushMutableObject()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x2421d0) [0xaaaab29521d0] ray::raylet::NodeManager::HandlePushMutableObject()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x2a4c60) [0xaaaab29b4c60] ray::rpc::ServerCallImpl<>::HandleRequestImpl()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x6d875c) [0xaaaab2de875c] EventTracker::RecordExecution()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x6d3e90) [0xaaaab2de3e90] std::_Function_handler<>::_M_invoke()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x6d4320) [0xaaaab2de4320] boost::asio::detail::completion_handler<>::do_complete()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0xcf5630) [0xaaaab3405630] boost::asio::detail::scheduler::do_run_one()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0xcf78c4) [0xaaaab34078c4] boost::asio::detail::scheduler::run()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0xcf7ec8) [0xaaaab3407ec8] boost::asio::io_context::run()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x1ace44) [0xaaaab28bce44] main
(raylet) /lib/aarch64-linux-gnu/libc.so.6(+0x273fc) [0xffff9e4d73fc]
(raylet) /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98) [0xffff9e4d74cc] __libc_start_main
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x1ffbdc) [0xaaaab290fbdc]
(raylet)
(RayWorkerWrapper pid=4635, ip=172.19.0.29) [rank9]:[W507 08:22:16.546296931 compiler_depend.ts:28] Warning: The oprator of MoeInitRouting will be removed from Pytorch and switch to AscendSpeed after 630. (function operator()) [repeated 15x across cluster]
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [18336]
*** SIGTERM received at time=1746606238 on cpu 141 ***
PC: @     0xffffb456ea9c  (unknown)  select
    @     0xfffde7359698        464  absl::lts_20230802::AbslFailureSignalHandler()
    @     0xffffb4a438ec  606883984  (unknown)
    @     0xffffb489d194        128  time_sleep
    @     0xffffb4749d1c        112  cfunction_vectorcall_O
    @     0xffffb46ae64c         48  _PyEval_EvalFrameDefault
    @     0xffffb47edf34        448  _PyEval_Vector
    @     0xffffb46a9f58         48  _PyEval_EvalFrameDefault
    @     0xffffb47edf34        448  _PyEval_Vector
    @     0xffffb489870c         48  atexit_callfuncs
    @     0xffffb482dc2c         64  Py_FinalizeEx
    @     0xffffb482ea54         80  Py_Exit
    @     0xffffb4833418         32  _PyErr_PrintEx
    @     0xffffb483409c        144  PyRun_SimpleStringFlags
    @     0xffffb485333c         32  Py_RunMain
    @     0xffffb4853d4c        224  Py_BytesMain
    @     0xffffb44b73fc        192  (unknown)
    @     0xffffb44b74cc        272  __libc_start_main
[2025-05-07 08:23:58,367 E 18484 18484] logging.cc:496: *** SIGTERM received at time=1746606238 on cpu 141 ***
[2025-05-07 08:23:58,367 E 18484 18484] logging.cc:496: PC: @     0xffffb456ea9c  (unknown)  select
[2025-05-07 08:23:58,372 E 18484 18484] logging.cc:496:     @     0xfffde73596c0        464  absl::lts_20230802::AbslFailureSignalHandler()
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb4a438ec  606883984  (unknown)
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb489d194        128  time_sleep
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb4749d1c        112  cfunction_vectorcall_O
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb46ae64c         48  _PyEval_EvalFrameDefault
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb47edf34        448  _PyEval_Vector
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb46a9f58         48  _PyEval_EvalFrameDefault
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb47edf34        448  _PyEval_Vector
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb489870c         48  atexit_callfuncs
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb482dc2c         64  Py_FinalizeEx
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb482ea54         80  Py_Exit
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb4833418         32  _PyErr_PrintEx
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb483409c        144  PyRun_SimpleStringFlags
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb485333c         32  Py_RunMain
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496:     @     0xffffb4853d4c        224  Py_BytesMain
[2025-05-07 08:23:58,377 E 18484 18484] logging.cc:496:     @     0xffffb44b73fc        192  (unknown)
[2025-05-07 08:23:58,377 E 18484 18484] logging.cc:496:     @     0xffffb44b74cc        272  __libc_start_main
Exception ignored in atexit callback: <function shutdown at 0xfffde57845e0>
Traceback (most recent call last):
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 1957, in shutdown
    time.sleep(0.5)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 1539, in sigterm_handler
    sys.exit(signum)
SystemExit: 15

bug_details.txt

@BestKuan BestKuan added the bug Something isn't working label May 7, 2025
@BestKuan BestKuan changed the title [Bug]: Qwen3-235B cannot be run successfully [Bug]: Qwen3-235B cannot be run successfully with vllm v1 engine on version 0.8.5rc1 May 7, 2025
@Yikun
Copy link
Collaborator

Yikun commented May 14, 2025

@noemotiovon any idea?

@noemotiovon
Copy link
Contributor

From the error message, it looks like you're using Ray's DAG (Compiled Graph) feature. Currently, this feature only supports CUDA. You can try disabling the DAG feature and rerun your tests.

Meanwhile, we're actively working on enabling DAG support on NPUs in Ray. Thank you for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants