Skip to content

A set of tools and examples for converting and utilizing powerful vision models, DINOv3 and EdgeTAM (SAM2), within the ONNX ecosystem.

License

Notifications You must be signed in to change notification settings

IoT-gamer/segment-anything-dinov3-onnx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DINOv3 and EdgeTAM (SAM2) with ONNX Runtime

This repository provides a set of tools and examples for converting and utilizing powerful vision models, DINOv3 and EdgeTAM (SAM2), within the ONNX ecosystem. The focus is on creating efficient, PyTorch-independent inference pipelines for tasks like one-shot segmentation, foreground extraction, and robust video object tracking. Also tried TFLite/LiteRT.

πŸ“‚ Repository Structure

β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ dinov3_onnx_export.ipynb               # Exports DINOv3 to ONNX
β”‚   β”œβ”€β”€ dinov3_tflite_export.ipynb               # Exports DINOv3 to TFLite
β”‚   β”œβ”€β”€ edgetam_onnx_export.ipynb              # Exports EdgeTAM encoder/decoder to ONNX
β”‚   β”œβ”€β”€ foreground_segmentation_onnx_export.ipynb # Trains and exports a foreground classifier
β”‚   β”œβ”€β”€ dinov3_one_shot_segmentation_onnx.ipynb  # Demo for one-shot segmentation with ONNX
β”‚   └── dinov3_one_shot_segmentation_tflite.ipynb  # Demo for one-shot segmentation with TFLite
β”‚
└── scripts/
    └── hybrid_tracker.py   

πŸ““ Notebooks

Each notebook is self-contained and can be run directly in Google Colab.

Notebook Description Link
dinov3_onnx_export.ipynb Converts the DINOv3 Vision Transformer (ViT) feature extractor to ONNX format. link
dinov3_tflite_export.ipynb Converts the DINOv3 Vision Transformer (ViT) feature extractor to TFLite format. link
edgetam_onnx_export.ipynb Exports the EdgeTAM image encoder and mask decoder models to ONNX for efficient segmentation. link
foreground_segmentation_onnx_export.ipynb Trains a logistic regression classifier on DINOv3 features for foreground segmentation and exports it to ONNX link
dinov3_one_shot_segmentation_onnx.ipynb Demonstrates one-shot segmentation using DINOv3 features and a reference mask, all in ONNX. link
dinov3_one_shot_segmentation_tflite.ipynb Demonstrates one-shot segmentation using DINOv3 features and a reference mask, all in TFLite. link

πŸ™ Acknowledgements

This work builds upon the official implementations and research from the following projects:

DINOv3: facebookresearch/dinov3

EdgeTAM: facebookresearch/EdgeTAM

Space-Time Correspondence as a Contrastive Random Walk: ajabri/videowalk

About

A set of tools and examples for converting and utilizing powerful vision models, DINOv3 and EdgeTAM (SAM2), within the ONNX ecosystem.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published