Describe the bug
I am trying to serve the unsloth/Qwen3.5-9B-GGUF model on a baremetal Windows host using OVMS v2026.0. The server fails to start and throws a gguf_tensor_to_f16 failed error during the LLM node initialization. I suspect the GGUF parser does not yet support the tensor structure of Qwen3.5.
Since there were similar issues with other new architectures like Qwen3-VL, I would like to ask if there is a plan or timeline to support the Qwen3.5 GGUF model structure.
To Reproduce
Steps to reproduce the behavior:
- Download
Qwen3.5-9B-Q4_K_M.gguf from Hugging Face (unsloth/Qwen3.5-9B-GGUF).
- Place the file in the local directory:
C:\ovms\models\unsloth\Qwen3.5-9B-GGUF\
- Run the following OVMS launch command on a Windows baremetal host:
.\ovms.exe --source_model "unsloth/Qwen3.5-9B-GGUF" --model_repository_path \models --model_name unsloth/Qwen3.5-9B-GGUF --task text_generation --gguf_filename Qwen3.5-9B-Q4_K_M.gguf --target_device GPU --port 8000 --rest_port 9000
- See error during startup.
Expected behavior
The model should load successfully, and the OVMS server should start listening on the specified gRPC and REST ports without crashing.
Logs
[2026-03-08 14:18:36.179][22220][serving][error][servable_initializer.cpp:214] Error during llm node initialization for models_path: C:\ovms\\models\unsloth\Qwen3.5-9B-GGUF\./Qwen3.5-9B-Q4_K_M.gguf exception: Check 'data != nullptr' failed at src\cpp\src\gguf_utils\gguf.cpp:96:
[load_gguf] gguf_tensor_to_f16 failed
[2026-03-08 14:18:36.179][22220][modelmanager][error][servable_initializer.cpp:425] Error during LLM node resources initialization: The LLM Node resource initialization failed
[2026-03-08 14:18:36.179][22220][serving][error][mediapipegraphdefinition.cpp:474] Failed to process LLM node graph unsloth/Qwen3.5-9B-GGUF
[2026-03-08 14:18:36.180][22220][modelmanager][error][modelmanager.cpp:184] Couldn't start model manager
Configuration
- OVMS version:
v2026.0 (OpenVINO Model Server 2026.0.0.4d3933c5, OpenVINO backend 2026.0.0)
- OVMS config.json file: N/A (Using command-line parameters)
- CPU, accelerator's versions: Target device is GPU, Arc B390 with Core X7 Ultra 358H. Baremetal Windows host.
- Model repository directory structure:
C:\ovms\models\unsloth\Qwen3.5-9B-GGUF\
└── Qwen3.5-9B-Q4_K_M.gguf
- Model:
unsloth/Qwen3.5-9B-GGUF from Hugging Face.
Additional context
I am running this directly on Windows (baremetal), not in a Docker container. I noticed in other issues that support for newer model structures is sometimes added in later patches. Let me know if there are any workarounds for GGUF loading in the meantime.
Describe the bug
I am trying to serve the
unsloth/Qwen3.5-9B-GGUFmodel on a baremetal Windows host using OVMS v2026.0. The server fails to start and throws agguf_tensor_to_f16 failederror during the LLM node initialization. I suspect the GGUF parser does not yet support the tensor structure of Qwen3.5.Since there were similar issues with other new architectures like Qwen3-VL, I would like to ask if there is a plan or timeline to support the Qwen3.5 GGUF model structure.
To Reproduce
Steps to reproduce the behavior:
Qwen3.5-9B-Q4_K_M.gguffrom Hugging Face (unsloth/Qwen3.5-9B-GGUF).C:\ovms\models\unsloth\Qwen3.5-9B-GGUF\Expected behavior
The model should load successfully, and the OVMS server should start listening on the specified gRPC and REST ports without crashing.
Logs
Configuration
v2026.0(OpenVINO Model Server 2026.0.0.4d3933c5, OpenVINO backend 2026.0.0)unsloth/Qwen3.5-9B-GGUFfrom Hugging Face.Additional context
I am running this directly on Windows (baremetal), not in a Docker container. I noticed in other issues that support for newer model structures is sometimes added in later patches. Let me know if there are any workarounds for GGUF loading in the meantime.