[HW Accel Support]: onnx detector performance drop #19741

phantomxe · 2025-08-25T10:15:27Z

phantomxe
Aug 25, 2025

Describe the problem you are having

Hi, I'm using a MSI RTX 3060 12Gb on my frigate system. Recenty, I updated from 0.15.2 to 0.16.
On 0.15.2, I used the yolov7-320 model with tensorrt. Detector inference speed is 5-7ms.
On 0.16, I moved to onnx runtime according to release notes. I got some performance drop on inference timings. I got these results.

model	inference time
YOLO-NAS-S	28.42 ms
YOLO-NAS-M	31.21 ms
YOLOV9-M	28.08 ms
YOLOX-L	60 ms

Version

0.16

Frigate config file

mqtt:
  enabled: false

detectors:
  onnx_0:
    type: onnx

ffmpeg:
  hwaccel_args: preset-nvidia-h264

model:
  model_type: yolo-generic
  path: /config/model_cache/onnx/yolov9-m.onnx
  input_tensor: nchw
  input_dtype: float
  width: 320
  height: 320
  labelmap_path: /labelmap/coco-80.txt

go2rtc:
  streams:
    camera_1:
      - ffmpeg:rtsp://mycameraurl#video=h264#hardware

objects:
  track:
    - person
    - cat
    - dog

cameras:
  camera_1:
    enabled: true
    ffmpeg:
      inputs:
        - path: 
            rtsp://mycameraurl
          input_args: preset-rtsp-restream-low-latency
          roles:
            - detect
    detect:
      enabled: true

version: 0.16-0
semantic_search:
  enabled: true
  reindex: false
  model_size: small
detect:
  enabled: true
face_recognition:
  enabled: false
  model_size: small
lpr:
  enabled: false
classification:
  bird:
    enabled: false

docker-compose file or Docker CLI command

services:
  frigate:
    container_name: frigate
    privileged: true # this may not be necessary for all setups
    restart: unless-stopped
    stop_grace_period: 30s # allow enough time to shut down the various services
    image: ghcr.io/blakeblackshear/frigate:stable-tensorrt
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    shm_size: "256mb" # update for your cameras based on calculation above
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /storage/frigate/config:/config
      - /storage/frigate/media:/media/frigate
      - type: tmpfs
        target: /tmp/cache
        tmpfs:
          size: 1000000000
    ports:
      - "8971:8971"
      # - "5000:5000" # Internal unauthenticated access. Expose carefully.
      - "8554:8554" # RTSP feeds
      - "8555:8555/tcp" # WebRTC over tcp
      - "8555:8555/udp" # WebRTC over udp
    environment:
      YOLO_MODELS: yolov7-320,yolov7x-640
      NVIDIA_VISIBLE_DEVICES: all
      NVIDIA_DRIVER_CAPABILITIES: compute,utility,video

Relevant Frigate log output

no issue

Relevant go2rtc log output

no issue

FFprobe output from your camera

no issue

Install method

Docker Compose

Object Detector

TensorRT

Network connection

Wired

Camera make and model

hikvision

Screenshots of the Frigate UI's System metrics pages

Any other information that may be helpful

No response

NickM-27 · 2025-08-25T10:15:52Z

inkeep-ai-assistant[bot]
bot Aug 25, 2025

You're seeing a performance drop because ONNX runtime and TensorRT are two different inference engines, and their efficiency can vary depending on the model and hardware. In Frigate 0.16, the ONNX detector is used for YOLO-NAS, YOLOv9, and YOLOX models on Nvidia GPUs by default, but it does not use native TensorRT acceleration in the same way as the dedicated TensorRT detector from previous versions(1). This can result in higher inference times, especially for more complex models(1).

You can still use TensorRT for YOLO models by specifying the correct detector type and ensuring your model is preprocessed for TensorRT, but ONNX is the expected runtime for YOLO-NAS and YOLOv9 in the latest Frigate configuration(1)(2). There is no default ONNX model provided; supported formats include YOLO-NAS, YOLOv3/v4/v7/v9, YOLOX, RF-DETR, and D-FINE(2).

If you want to maximize performance and use TensorRT directly, follow the configuration guidance for the TensorRT detector and ensure you use the correct Frigate image (with the -tensorrt tag) and preprocessed TRT engine files(1). For ONNX, you can try different models or adjust the model input size, but ONNX runtime on Nvidia may not match TensorRT speeds for all models(1)(2).

For reference on supported detectors and models, see:

📚 Sources:

Have another question? Just tag @inkeep.

9 replies

NickM-27 Aug 25, 2025
Collaborator Sponsor

Be sure you're letting the model run long enough to convert to the GPU format (1 minute or so)

phantomxe Aug 25, 2025
Author

My system has a xeon cpu based sandy bridge-e which is not supporting the avx2 or avx512 instruction set. Is it possible to conflict with onnx?

NickM-27 Aug 25, 2025
Collaborator Sponsor

It should all be run on the GPU so I'm not sure why that would matter, could just be a CPU speed in general thing though

phantomxe Aug 25, 2025
Author

I tried rfdetr model as well and I got 28-29ms. GPU utilization is still low. I can't find the issue. There is no solution other than reverting to 0.15.2.

NickM-27 Aug 25, 2025
Collaborator Sponsor

GPU utilization being low is not unexpected, speed isn't about using more of the GPU cores.

galperinm · 2025-08-29T16:15:38Z

galperinm
Aug 29, 2025

Following this. Just tried switching from Coral to my spare GTX 1080, and though I can confirm ONNX is using the GPU as expected (I see the frigate.detector.onnx process in nividia-smi), I'm getting what seems to me like unreasonably slow inference times of ~30ms. What I've tried:

Different models: yolo-nas and yolov9-t, both 320x320
USE_FP16=false, as I understand GTX 1080's FP16 performance isn't great
Using my i5-8500t's iGPU with openvino and the same models, which results in slightly better inference times of ~25ms compared to my GTX 1080's 30+ms-- seems backwards to me

I really don't want to switch back to the coral as it just can't run the newer, more accurate models, but with the number of cameras I have-- even with 5fps, low-res detect streams-- 25-30ms+ just won't cut it. I never tried the tensorrt detector with .trt models, but now that it's gone from 16.0 there doesn't seem to be much point.

My config.yml:

detectors:
  onnx:
    type: onnx

model:
#  model_type: yolonas
#  input_pixel_format: bgr # <--- yolonas only
#  path: /config/yolo_nas_s.onnx # <--- yolonas only
  model_type: yolo-generic # <--- yolo-generic only
  input_dtype: float # <--- yolo-generic only
  path: /config/yolov9-t.onnx # <--- yolo-generic only
  width: 320
  height: 320
  input_tensor: nchw
  labelmap_path: /labelmap/coco-80.txt

ffmpeg:
  hwaccel_args: preset-nvidia

6 replies

galperinm Aug 29, 2025

Correct. Yolov9-t is actually slower than yolonas on the 1080, even after waiting for completion of tensor model conversion (I can see the detector CPU usage drop after a minute or so of uptime). I can see the appropriate model is selected in the logs each time (i.e. "ONNX: /config/yolo_nas_s.onnx loaded") with no errors. Am I wrong for expecting the GTX 1080 to outperform an intel iGPU (UHD Graphics 630) from the same era?

Using yolov9-t with GTX 1080 and onnx, 28-30+ ms:

Using yolo-nas-s with GTX 1080 and onnx, 25-26ms:

Using yolov9-t with i5-8500t and iGPU openvino, 20-22ms:

Using yolo-nas-s with i5-8500t and iGPU openvino, 27-28ms:

Obviously the iGPU options use different configs:

detectors:
#  coral:
#    type: edgetpu
#    device: pci
  ov_0:
    type: openvino
    device: GPU

model:
#  model_type: yolonas # <--- yolonas only
#  input_pixel_format: bgr # <--- yolonas only
#  path: /config/yolo_nas_s.onnx # <--- yolonas only
  model_type: yolo-generic # <--- yolo-generic only
  input_dtype: float # <--- yolo-generic only
  path: /config/yolov9-t.onnx # <--- yolo-generic only
  width: 320
  height: 320
  input_tensor: nchw
  labelmap_path: /labelmap/coco-80.txt

ffmpeg:
  hwaccel_args: preset-vaapi

cvroque Aug 29, 2025

Entirely off-topic but I was just wondering if it was worthwhile to change the yolo-nas model on my i3-8100 setup to get better detections and your comparisons were really helpful.

Thanks and here's hoping your issue gets solved.

NickM-27 Aug 29, 2025
Collaborator Sponsor

While not ideal, these inference times are not super problematic, the GPU load is also low so multiple could be run.

galperinm Aug 30, 2025

@NickM-27 Hmm, just to make sure I understand you correctly-- if I add multiple onnx detectors, they'd all run concurrently on my one GPU, and as long as inference times don't jump/GPU usage stays low, I could support more detect streams that way?

Something like (for example) 5 onnx detectors, average inference let's say remains 25ms, that could give me about 40 detect streams at 5fps? [1000ms/(25ms*5fps)]*5 detectors?

NickM-27 Aug 30, 2025
Collaborator Sponsor

I'd suggest starting with fewer than 5, but yes in general

phantomxe · 2025-08-30T08:42:53Z

phantomxe
Aug 30, 2025
Author

In the source code, I see some tensorrt stuff is disabled by default. Is there a recommended way to activate this?

frigate/frigate/util/model.py

Lines 312 to 338 in c723164

    
           elif provider == "TensorrtExecutionProvider": 
        
               # TensorrtExecutionProvider uses too much memory without options to control it 
        
               # so it is not enabled by default 
        
               if device == "Tensorrt": 
        
                   os.makedirs( 
        
                       os.path.join(MODEL_CACHE_DIR, "tensorrt/ort/trt-engines"), 
        
                       exist_ok=True, 
        
                   ) 
        
                   device_id = 0 if not device.isdigit() else int(device) 
        
                   providers.append(provider) 
        
                   options.append( 
        
                       { 
        
                           "device_id": device_id, 
        
                           "trt_fp16_enable": requires_fp16 
        
                           and os.environ.get("USE_FP_16", "True") != "False", 
        
                           "trt_timing_cache_enable": True, 
        
                           "trt_engine_cache_enable": True, 
        
                           "trt_timing_cache_path": os.path.join( 
        
                               MODEL_CACHE_DIR, "tensorrt/ort" 
        
                           ), 
        
                           "trt_engine_cache_path": os.path.join( 
        
                               MODEL_CACHE_DIR, "tensorrt/ort/trt-engines" 
        
                           ), 
        
                       } 
        
                   ) 
        
               else: 
        
                   continue

5 replies

NickM-27 Aug 30, 2025
Collaborator Sponsor

Even if you undid that the dependencies aren't there for it and you can't install them because they interfere with ONNX dependencies for tensorrt

phantomxe Aug 30, 2025
Author

That code block is already on the onnx runtime providers function. Why would it interfere? Also, I found it something like this, https://github.com/onnx/onnx-tensorrt Is frigate using different way to run onnx on nvidia graphics?

NickM-27 Aug 30, 2025
Collaborator Sponsor

To support TensorRT via ONNX there are different dependency versions that are required.

Also, in previous extensive testing the TensorRTExecutionProvider performed worse than CudaExecutionProvider and used close to 5x the amount of system memory. It just does not make sense to run it.

Also, I found it something like this, https://github.com/onnx/onnx-tensorrt

that is for converting onnx models to trt format itself, which is not really directly related.

phantomxe Aug 30, 2025
Author

I tried that manually in the docker container. Not worked as expected.

>>> sess = ort.InferenceSession('yolov9-s.onnx', providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider'])
2025-08-30 16:08:44.500177405 [E:onnxruntime:Default, provider_bridge_ort.cc:1848 TryGetProviderInfo_TensorRT] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1539 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_tensorrt.so with error: libnvinfer.so.10: cannot open shared object file: No such file or directory

*************** EP Error ***************
EP Error /onnxruntime_src/onnxruntime/python/onnxruntime_pybind_state.cc:507 void onnxruntime::python::RegisterTensorRTPluginsAsCustomOps(PySessionOptions&, const onnxruntime::ProviderOptions&) Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported.
 when using ['TensorrtExecutionProvider', 'CUDAExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.

Also, there is an another provider for rtx series cards called "NvTensorRtRtxExecutionProvider". I'm trying to find a way to install and run it.

NickM-27 Aug 30, 2025
Collaborator Sponsor

I tried that manually in the docker container. Not worked as expected.

Yes, because the deps are missing, like I said
...

Uh oh!

[HW Accel Support]: onnx detector performance drop #19741

Uh oh!

phantomxe Aug 25, 2025

Describe the problem you are having

Version

Frigate config file

docker-compose file or Docker CLI command

Relevant Frigate log output

Relevant go2rtc log output

FFprobe output from your camera

Install method

Object Detector

Network connection

Camera make and model

Screenshots of the Frigate UI's System metrics pages

Any other information that may be helpful

Replies: 3 comments · 20 replies

Uh oh!

inkeep-ai-assistant[bot] bot Aug 25, 2025

Uh oh!

NickM-27 Aug 25, 2025 Collaborator Sponsor

Uh oh!

Uh oh!

phantomxe Aug 25, 2025 Author

Uh oh!

NickM-27 Aug 25, 2025 Collaborator Sponsor

Uh oh!

phantomxe Aug 25, 2025 Author

Uh oh!

NickM-27 Aug 25, 2025 Collaborator Sponsor

Uh oh!

Uh oh!

galperinm Aug 29, 2025

Uh oh!

galperinm Aug 29, 2025

Uh oh!

cvroque Aug 29, 2025

Uh oh!

NickM-27 Aug 29, 2025 Collaborator Sponsor

Uh oh!

galperinm Aug 30, 2025

Uh oh!

NickM-27 Aug 30, 2025 Collaborator Sponsor

Uh oh!

phantomxe Aug 30, 2025 Author

Uh oh!

NickM-27 Aug 30, 2025 Collaborator Sponsor

Uh oh!

phantomxe Aug 30, 2025 Author

Uh oh!

NickM-27 Aug 30, 2025 Collaborator Sponsor

Uh oh!

phantomxe Aug 30, 2025 Author

Uh oh!

NickM-27 Aug 30, 2025 Collaborator Sponsor

phantomxe
Aug 25, 2025

Replies: 3 comments 20 replies

inkeep-ai-assistant[bot]
bot Aug 25, 2025

NickM-27 Aug 25, 2025
Collaborator Sponsor

phantomxe Aug 25, 2025
Author

NickM-27 Aug 25, 2025
Collaborator Sponsor

phantomxe Aug 25, 2025
Author

NickM-27 Aug 25, 2025
Collaborator Sponsor

galperinm
Aug 29, 2025

NickM-27 Aug 29, 2025
Collaborator Sponsor

NickM-27 Aug 30, 2025
Collaborator Sponsor

phantomxe
Aug 30, 2025
Author

NickM-27 Aug 30, 2025
Collaborator Sponsor

phantomxe Aug 30, 2025
Author

NickM-27 Aug 30, 2025
Collaborator Sponsor

phantomxe Aug 30, 2025
Author

NickM-27 Aug 30, 2025
Collaborator Sponsor