[Speed] Qwen3-1.7B prefill speed is very slow on Android

# 🏎️  Speed Report



- The model code: Qwen3-1.7B-q4f16_1 

- The model configuration (e.g. quantization mode, running data type, etc.): q4f16_1
- Device (e.g. MacBook Pro M2, PC+RTX 3080): Xiaomi 14 Pro(Snapdragon 8 gen3)
- OS (if applicable): Android
- Encode speed (Token/s):  1 tokens/s (110 tokens/s for Llama-3.2-3B-Instruct-q4f16_0)
- Decode speed (Token/s): 12 tokens/s (12 tokens/s for Llama-3.2-3B-Instruct-q4f16_0)
- Memory usage (if applicable):

<img width="270" height="600" alt="Image" src="https://github.com/user-attachments/assets/a34b00f0-3b7d-4572-874b-daf6e90ccdcc" />

<img width="270" height="600" alt="Image" src="https://github.com/user-attachments/assets/f5f4a671-07e7-45a7-a6c1-b0323b6e8e8b" />

//
//


**Llama-3.2-3B-Instruct-q4f16_0**

<img width="270" height="600" alt="Image" src="https://github.com/user-attachments/assets/898a7237-a1b9-4e6f-b687-bd804c998a91" />



I'm unsure if there is an issue during the compilation of the Qwen3-1.7B-q4f16_1 model. The  Could anyone help me identify if something is wrong?

[compile_models_log.txt](https://github.com/user-attachments/files/23568323/compile_models_log.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Speed] Qwen3-1.7B prefill speed is very slow on Android #3379

🏎️ Speed Report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Speed] Qwen3-1.7B prefill speed is very slow on Android #3379

Description

🏎️ Speed Report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions