🏎️ Speed Report
-
The model code: Qwen3-1.7B-q4f16_1
-
The model configuration (e.g. quantization mode, running data type, etc.): q4f16_1
-
Device (e.g. MacBook Pro M2, PC+RTX 3080): Xiaomi 14 Pro(Snapdragon 8 gen3)
-
OS (if applicable): Android
-
Encode speed (Token/s): 1 tokens/s (110 tokens/s for Llama-3.2-3B-Instruct-q4f16_0)
-
Decode speed (Token/s): 12 tokens/s (12 tokens/s for Llama-3.2-3B-Instruct-q4f16_0)
-
Memory usage (if applicable):
//
//
Llama-3.2-3B-Instruct-q4f16_0
I'm unsure if there is an issue during the compilation of the Qwen3-1.7B-q4f16_1 model. The Could anyone help me identify if something is wrong?
compile_models_log.txt