Slow Benchmark Result from Trained Model Using SparseML.

Hi, I have an issue about Transter Learning using SparseML by following instructions in https://github.com/neuralmagic/sparseml/blob/main/integrations/ultralytics-yolov8/tutorials/sparse-transfer-learning.md.

More specific, I trained:
```
sparseml.ultralytics.train \
  --model "zoo:cv/detection/yolov8-m/pytorch/ultralytics/coco/pruned80-none" \
  --recipe "zoo:cv/detection/yolov8-m/pytorch/ultralytics/voc/pruned80_quant-none" \
  --data "coco128.yaml" \
  --batch 2
```
and then export the trained model:
```
sparseml.ultralytics.export_onnx \
  --model ./runs/detect/train/weights/last.pt \
  --save_dir yolov8-m
```

And then run benchmark using Deepsparse:
```
>> deepsparse.benchmark /home/ubuntu/code/models/trained_model.onnx
2025-03-03 03:23:56 deepsparse.benchmark.helpers INFO     Thread pinning to cores enabled
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.8.0 COMMUNITY | (e3778e93) (release) (optimized) (system=avx512_vnni, binary=avx512)
2025-03-03 03:23:56 deepsparse.benchmark.benchmark_model INFO     deepsparse.engine.Engine:
        onnx_file_path: /home/ubuntu/code/models/trained_model.onnx
        batch_size: 1
        num_cores: 4
        num_streams: 1
        scheduler: Scheduler.default
        fraction_of_supported_ops: 0.0
        cpu_avx_type: avx512
        cpu_vnni: True
2025-03-03 03:23:56 deepsparse.utils.onnx INFO     Generating input 'images', type = uint8, shape = [1, 3, 640, 640]
2025-03-03 03:23:56 deepsparse.benchmark.benchmark_model INFO     Starting 'singlestream' performance measurements for 10 seconds
Original Model Path: /home/ubuntu/code/models/trained_model.onnx
Batch Size: 1
Scenario: sync
Throughput (items/sec): 4.1084
Latency Mean (ms/batch): 243.3896
Latency Median (ms/batch): 240.5514
Latency Std (ms/batch): 10.9256
Iterations: 42
```

And here are related dependencies and training environment
Libraries:
- torch==2.5.1
- sparseml==1.8.0
- deepsparse==1.8.0
- ultralytics==8.0.124
- onnx==1.14.1
- onnxruntime==1.17.0

Training Environment:
- NVIDIA GeForece RTX 4070 Ti (12 GB RAM)
- Ubuntu 22.04

It is quite slow. I suspect that it is about `fraction_of_supported_ops: 0.0` related to the benchmark result, because I run benchmark on the pretrained weight used to train in the training command mentioned (get from https://sparsezoo.neuralmagic.com/models/yolov8-m-coco-pruned80_quantized?hardware=deepsparse-c6i.12xlarge&comparison=yolov8-m-coco-base).
```
>> deepsparse.benchmark /home/ubuntu/code/models/pretrained_model.onnx
2025-03-03 03:52:06 deepsparse.benchmark.helpers INFO     Thread pinning to cores enabled
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.8.0 COMMUNITY | (e3778e93) (release) (optimized) (system=avx512_vnni, binary=avx512)
2025-03-03 03:52:07 deepsparse.benchmark.benchmark_model INFO     deepsparse.engine.Engine:
        onnx_file_path: /home/ubuntu/code/models/pretrained_model.onnx
        batch_size: 1
        num_cores: 4
        num_streams: 1
        scheduler: Scheduler.default
        fraction_of_supported_ops: 1.0
        cpu_avx_type: avx512
        cpu_vnni: True
2025-03-03 03:52:08 deepsparse.utils.onnx INFO     Generating input 'images', type = uint8, shape = [1, 3, 640, 640]
2025-03-03 03:52:08 deepsparse.benchmark.benchmark_model INFO     Starting 'singlestream' performance measurements for 10 seconds
Original Model Path: /home/ubuntu/code/models/pretrained_model.onnx
Batch Size: 1
Scenario: sync
Throughput (items/sec): 25.9231
Latency Mean (ms/batch): 38.5548
Latency Median (ms/batch): 38.2803
Latency Std (ms/batch): 1.4339
Iterations: 260
```
I found out that `fraction_of_supported_ops` is `1.0`. 

Then I searched about this, I found that is about optimized runtime as described in https://github.com/neuralmagic/deepsparse/blob/36b92eeb730a74a787cea467c9132eaa1b78167f/src/deepsparse/engine.py#L417, and that's it.

I have some questions:
1. What exactly is `fraction_of_supported_ops`?
2. What can I do about `fraction_of_supported_ops`?
3. And how `fraction_of_supported_ops` affect to the benchmark result?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow Benchmark Result from Trained Model Using SparseML. #2361

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slow Benchmark Result from Trained Model Using SparseML. #2361

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions