Skip to content

[Speed] Qwen3-1.7B prefill speed is very slow on Android #3379

@Msyu1020

Description

@Msyu1020

🏎️ Speed Report

  • The model code: Qwen3-1.7B-q4f16_1

  • The model configuration (e.g. quantization mode, running data type, etc.): q4f16_1

  • Device (e.g. MacBook Pro M2, PC+RTX 3080): Xiaomi 14 Pro(Snapdragon 8 gen3)

  • OS (if applicable): Android

  • Encode speed (Token/s): 1 tokens/s (110 tokens/s for Llama-3.2-3B-Instruct-q4f16_0)

  • Decode speed (Token/s): 12 tokens/s (12 tokens/s for Llama-3.2-3B-Instruct-q4f16_0)

  • Memory usage (if applicable):

Image Image

//
//

Llama-3.2-3B-Instruct-q4f16_0

Image

I'm unsure if there is an issue during the compilation of the Qwen3-1.7B-q4f16_1 model. The Could anyone help me identify if something is wrong?

compile_models_log.txt

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions