gguf_tensor_to_f16 failed when loading Qwen3.5-9B GGUF model

**Describe the bug**
I am trying to serve the `unsloth/Qwen3.5-9B-GGUF` model on a baremetal Windows host using OVMS v2026.0. The server fails to start and throws a `gguf_tensor_to_f16 failed` error during the LLM node initialization. I suspect the GGUF parser does not yet support the tensor structure of Qwen3.5.

Since there were similar issues with other new architectures like Qwen3-VL, I would like to ask if there is a plan or timeline to support the Qwen3.5 GGUF model structure.

**To Reproduce**
Steps to reproduce the behavior:

1. Download `Qwen3.5-9B-Q4_K_M.gguf` from Hugging Face (`unsloth/Qwen3.5-9B-GGUF`).
2. Place the file in the local directory: `C:\ovms\models\unsloth\Qwen3.5-9B-GGUF\`
3. Run the following OVMS launch command on a Windows baremetal host:

```powershell
.\ovms.exe --source_model "unsloth/Qwen3.5-9B-GGUF" --model_repository_path \models --model_name unsloth/Qwen3.5-9B-GGUF --task text_generation --gguf_filename Qwen3.5-9B-Q4_K_M.gguf --target_device GPU --port 8000 --rest_port 9000

```

4. See error during startup.

**Expected behavior**
The model should load successfully, and the OVMS server should start listening on the specified gRPC and REST ports without crashing.

**Logs**

```text
[2026-03-08 14:18:36.179][22220][serving][error][servable_initializer.cpp:214] Error during llm node initialization for models_path: C:\ovms\\models\unsloth\Qwen3.5-9B-GGUF\./Qwen3.5-9B-Q4_K_M.gguf exception: Check 'data != nullptr' failed at src\cpp\src\gguf_utils\gguf.cpp:96:
[load_gguf] gguf_tensor_to_f16 failed

[2026-03-08 14:18:36.179][22220][modelmanager][error][servable_initializer.cpp:425] Error during LLM node resources initialization: The LLM Node resource initialization failed
[2026-03-08 14:18:36.179][22220][serving][error][mediapipegraphdefinition.cpp:474] Failed to process LLM node graph unsloth/Qwen3.5-9B-GGUF
[2026-03-08 14:18:36.180][22220][modelmanager][error][modelmanager.cpp:184] Couldn't start model manager

```

**Configuration**

1. OVMS version: `v2026.0` (OpenVINO Model Server 2026.0.0.4d3933c5, OpenVINO backend 2026.0.0)
2. OVMS config.json file: N/A (Using command-line parameters)
3. CPU, accelerator's versions: Target device is GPU, Arc B390 with Core X7 Ultra 358H. Baremetal Windows host.
4. Model repository directory structure:
```text
C:\ovms\models\unsloth\Qwen3.5-9B-GGUF\
└── Qwen3.5-9B-Q4_K_M.gguf

```


5. Model: `unsloth/Qwen3.5-9B-GGUF` from Hugging Face.

**Additional context**
I am running this directly on Windows (baremetal), not in a Docker container. I noticed in other issues that support for newer model structures is sometimes added in later patches. Let me know if there are any workarounds for GGUF loading in the meantime.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf_tensor_to_f16 failed when loading Qwen3.5-9B GGUF model #4046

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gguf_tensor_to_f16 failed when loading Qwen3.5-9B GGUF model #4046

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions