Skip to content

[Bug] Totally broken in Python3.13. Under the rocm 6.2 build #3380

@delphiRo

Description

@delphiRo

🐛 Bug

[2025-11-17 10:49:40] INFO auto_device.py:36: Using device: rocm:0
[2025-11-17 10:49:40] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/gemma-3-27b-it-q4f16_1-MLC
[2025-11-17 10:49:40] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2025-11-17 10:49:40] INFO download_cache.py:166: Weights already downloaded: /home/rig/.cache/mlc_llm/model_weights/hf/mlc-ai/gemma-3-27b-i
[2025-11-17 10:49:40] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2025-11-17 10:49:40] INFO jit.py:158: Using cached model lib: /home/rig/.cache/mlc_llm/model_lib/b7e96d134f84cd2d4cf435be3748adc1.so
[2025-11-17 10:49:40] INFO engine_base.py:186: The selected engine mode is interactive. We fix max batch size to 1 for interactive single se
[2025-11-17 10:49:40] INFO engine_base.py:200: If you have low concurrent requests and want to use less GPU memory, please select mode "loca
[2025-11-17 10:49:40] INFO engine_base.py:210: If you have high concurrent requests and want to maximize the GPU memory utilization, please
!!!!!!! Segfault encountered !!!!!!!
File "./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c", line 0, in 0x000075fd80a4532f
File "", line 0, in std::filesystem::__cxx11::path::~path()
File "", line 0, in std::filesystem::__cxx11::path::~path()
File "", line 0, in mlc::llm::Tokenizer::DetectTokenizerInfo(tvm::ffi::String const&)
File "", line 0, in mlc::llm::Tokenizer::FromPath(tvm::ffi::String const&, std::optionalmlc::llm::TokenizerInfo)
File "/usr/local/src/conda/python-3.13.9/Include/internal/pycore_call.h", line 168, in _PyObject_VectorcallTstate
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 327, in PyObject_Vectorcall
File "/usr/local/src/conda/python-3.13.9/Python/generated_cases.c.h", line 813, in _PyEval_EvalFrameDefault
File "/usr/local/src/conda/python-3.13.9/Include/internal/pycore_ceval.h", line 119, in _PyEval_EvalFrame
File "/usr/local/src/conda/python-3.13.9/Python/ceval.c", line 1820, in _PyEval_Vector
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 413, in _PyFunction_Vectorcall
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 135, in _PyObject_VectorcallDictTstate
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 504, in _PyObject_Call_Prepend
File "/usr/local/src/conda/python-3.13.9/Objects/typeobject.c", line 9816, in slot_tp_init
File "/usr/local/src/conda/python-3.13.9/Objects/typeobject.c", line 1997, in type_call
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 242, in _PyObject_MakeTpCall
File "/usr/local/src/conda/python-3.13.9/Python/generated_cases.c.h", line 813, in _PyEval_EvalFrameDefault
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 146, in _PyObject_VectorcallDictTstate
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 504, in _PyObject_Call_Prepend
File "/usr/local/src/conda/python-3.13.9/Objects/typeobject.c", line 9816, in slot_tp_init
File "/usr/local/src/conda/python-3.13.9/Objects/typeobject.c", line 1997, in type_call
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 242, in _PyObject_MakeTpCall
File "/usr/local/src/conda/python-3.13.9/Python/generated_cases.c.h", line 1509, in _PyEval_EvalFrameDefault
File "/usr/local/src/conda/python-3.13.9/Include/internal/pycore_ceval.h", line 119, in _PyEval_EvalFrame
File "/usr/local/src/conda/python-3.13.9/Python/ceval.c", line 1820, in _PyEval_Vector
File "/usr/local/src/conda/python-3.13.9/Python/ceval.c", line 604, in PyEval_EvalCode
File "/usr/local/src/conda/python-3.13.9/Python/bltinmodule.c", line 1143, in builtin_exec_impl
File "/usr/local/src/conda/python-3.13.9/Python/clinic/bltinmodule.c.h", line 556, in builtin_exec
File "/usr/local/src/conda/python-3.13.9/Objects/methodobject.c", line 440, in cfunction_vectorcall_FASTCALL_KEYWORDS
File "/usr/local/src/conda/python-3.13.9/Include/internal/pycore_call.h", line 168, in _PyObject_VectorcallTstate
File "/usr/local/src/conda/python-3.13.9/Objects/call.c", line 327, in PyObject_Vectorcall
File "/usr/local/src/conda/python-3.13.9/Python/generated_cases.c.h", line 813, in _PyEval_EvalFrameDefault
File "/usr/local/src/conda/python-3.13.9/Modules/main.c", line 349, in pymain_run_module
File "/usr/local/src/conda/python-3.13.9/Modules/main.c", line 690, in pymain_run_python
File "/usr/local/src/conda/python-3.13.9/Modules/main.c", line 775, in Py_RunMain
File "/usr/local/src/conda/python-3.13.9/Modules/main.c", line 829, in Py_BytesMain
File "", line 0, in _start
File "", line 0, in 0xffffffffffffffff

Segmentation fault (core dumped)

To Reproduce

Steps to reproduce the behavior:

python -m mlc_llm serve HF://mlc-ai/gemma-3-27b-it-q4f16_1-MLC --port 8081 --overrides "gpu_memory_utilization=0.88;tensor_parallel_shards=1" --host 0.0.0.0 --mode=interactive

Expected behavior

Before the September builds everything worked fine

Environment

  • Instinct mi50 16gb
  • Operating system Ubuntu 22.04
  • Conda Environment
  • How you installed MLC-LLM (conda, source): (via pip official install way)
  • How you installed TVM (pip, source): pip
  • Python version (e.g. 3.10): 3.13
  • GPU driver version (if applicable): rocm 6.3.4
  • CUDA/cuDNN version (if applicable):
  • TVM Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
  • Any other relevant information:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions