-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
🐛 Bug
[2025-11-17 10:49:40] INFO auto_device.py:36: Using device: rocm:0
[2025-11-17 10:49:40] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/gemma-3-27b-it-q4f16_1-MLC
[2025-11-17 10:49:40] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2025-11-17 10:49:40] INFO download_cache.py:166: Weights already downloaded: /home/rig/.cache/mlc_llm/model_weights/hf/mlc-ai/gemma-3-27b-i
[2025-11-17 10:49:40] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2025-11-17 10:49:40] INFO jit.py:158: Using cached model lib: /home/rig/.cache/mlc_llm/model_lib/b7e96d134f84cd2d4cf435be3748adc1.so
[2025-11-17 10:49:40] INFO engine_base.py:186: The selected engine mode is interactive. We fix max batch size to 1 for interactive single se
[2025-11-17 10:49:40] INFO engine_base.py:200: If you have low concurrent requests and want to use less GPU memory, please select mode "loca
[2025-11-17 10:49:40] INFO engine_base.py:210: If you have high concurrent requests and want to maximize the GPU memory utilization, please
!!!!!!! Segfault encountered !!!!!!!
File "./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c", line 0, in 0x000075fd80a4532f
File "", line 0, in std::filesystem::__cxx11::path::~path()
File "", line 0, in std::filesystem::__cxx11::path::~path()
File "", line 0, in mlc::llm::Tokenizer::DetectTokenizerInfo(tvm::ffi::String const&)
File "", line 0, in mlc::llm::Tokenizer::FromPath(tvm::ffi::String const&, std::optionalmlc::llm::TokenizerInfo)
File "/usr/local/src/conda/python-3.13.9/Include/internal/pycore_call.h", line 168, in _PyObject_VectorcallTstate
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 327, in PyObject_Vectorcall
File "/usr/local/src/conda/python-3.13.9/Python/generated_cases.c.h", line 813, in _PyEval_EvalFrameDefault
File "/usr/local/src/conda/python-3.13.9/Include/internal/pycore_ceval.h", line 119, in _PyEval_EvalFrame
File "/usr/local/src/conda/python-3.13.9/Python/ceval.c", line 1820, in _PyEval_Vector
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 413, in _PyFunction_Vectorcall
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 135, in _PyObject_VectorcallDictTstate
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 504, in _PyObject_Call_Prepend
File "/usr/local/src/conda/python-3.13.9/Objects/typeobject.c", line 9816, in slot_tp_init
File "/usr/local/src/conda/python-3.13.9/Objects/typeobject.c", line 1997, in type_call
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 242, in _PyObject_MakeTpCall
File "/usr/local/src/conda/python-3.13.9/Python/generated_cases.c.h", line 813, in _PyEval_EvalFrameDefault
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 146, in _PyObject_VectorcallDictTstate
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 504, in _PyObject_Call_Prepend
File "/usr/local/src/conda/python-3.13.9/Objects/typeobject.c", line 9816, in slot_tp_init
File "/usr/local/src/conda/python-3.13.9/Objects/typeobject.c", line 1997, in type_call
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 242, in _PyObject_MakeTpCall
File "/usr/local/src/conda/python-3.13.9/Python/generated_cases.c.h", line 1509, in _PyEval_EvalFrameDefault
File "/usr/local/src/conda/python-3.13.9/Include/internal/pycore_ceval.h", line 119, in _PyEval_EvalFrame
File "/usr/local/src/conda/python-3.13.9/Python/ceval.c", line 1820, in _PyEval_Vector
File "/usr/local/src/conda/python-3.13.9/Python/ceval.c", line 604, in PyEval_EvalCode
File "/usr/local/src/conda/python-3.13.9/Python/bltinmodule.c", line 1143, in builtin_exec_impl
File "/usr/local/src/conda/python-3.13.9/Python/clinic/bltinmodule.c.h", line 556, in builtin_exec
File "/usr/local/src/conda/python-3.13.9/Objects/methodobject.c", line 440, in cfunction_vectorcall_FASTCALL_KEYWORDS
File "/usr/local/src/conda/python-3.13.9/Include/internal/pycore_call.h", line 168, in _PyObject_VectorcallTstate
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 327, in PyObject_Vectorcall
File "/usr/local/src/conda/python-3.13.9/Python/generated_cases.c.h", line 813, in _PyEval_EvalFrameDefault
File "/usr/local/src/conda/python-3.13.9/Modules/main.c", line 349, in pymain_run_module
File "/usr/local/src/conda/python-3.13.9/Modules/main.c", line 690, in pymain_run_python
File "/usr/local/src/conda/python-3.13.9/Modules/main.c", line 775, in Py_RunMain
File "/usr/local/src/conda/python-3.13.9/Modules/main.c", line 829, in Py_BytesMain
File "", line 0, in _start
File "", line 0, in 0xffffffffffffffff
Segmentation fault (core dumped)
To Reproduce
Steps to reproduce the behavior:
python -m mlc_llm serve HF://mlc-ai/gemma-3-27b-it-q4f16_1-MLC --port 8081 --overrides "gpu_memory_utilization=0.88;tensor_parallel_shards=1" --host 0.0.0.0 --mode=interactive
Expected behavior
Before the September builds everything worked fine
Environment
- Instinct mi50 16gb
- Operating system Ubuntu 22.04
- Conda Environment
- How you installed MLC-LLM (
conda, source): (via pip official install way) - How you installed TVM (
pip, source): pip - Python version (e.g. 3.10): 3.13
- GPU driver version (if applicable): rocm 6.3.4
- CUDA/cuDNN version (if applicable):
- TVM Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models): - Any other relevant information: