-
Notifications
You must be signed in to change notification settings - Fork 459
Description
📄 Description
Current Problem:
-
Mixed augmentation pipelines: OTX currently uses a combination of
torchvisionand OpenMMLab augmentations (some self-implemented in the repo). Later in the pipeline, operations rely on NumPy images, whiletorchvisionexpects PyTorch tensors. This causes constant conversion between tensors and NumPy arrays, introducing a performance bottleneck that slows down training. -
Inconsistent interfaces: Parameter names, formats, and APIs differ across the various augmentation implementations, making the codebase harder to maintain and extend.
-
Redundant self-implemented augmentations: Several augmentations are re-implemented locally in OTX, even though robust third-party solutions exist and are actively maintained.
Proposed Solution:
Evaluate Kornia as the primary augmentation library for OTX. Kornia is a differentiable computer vision library built on top of PyTorch, with an extensive set of augmentation operators.
Key benefits of Kornia:
- 🟢 PyTorch-first design: All augmentations operate directly on PyTorch tensors, removing unnecessary conversions from/to NumPy and improving training efficiency.
- 🟢 Rich functionality: Provides a wide range of augmentations, including geometric, color, and intensity transformations, as well as advanced ones (e.g., motion blur, random perspective, cutout).
- 🟢 Differentiable & GPU-accelerated: Augmentations are differentiable, making them suitable for integration with modern deep learning pipelines, and can be executed efficiently on GPUs.
- 🟢 Unified API & consistency: Standardized function signatures simplify maintainability and reduce code complexity.
- 🟢 Extended support: Handles not only images but also masks, bounding boxes, and keypoints, aligning with the requirements of detection and segmentation tasks.
Benchmarking Plan (Kornia vs torchvision.v2):
- Compare throughput (images/sec) during training with augmentation-heavy pipelines.
- Measure GPU utilization and memory footprint when using Kornia vs torchvision v2.
- Evaluate coverage of transformations (ensure Kornia includes all augmentations currently used in OTX).
- Validate annotation consistency (masks, bounding boxes, keypoints remain synchronized after augmentation).
🎯 Objective
- Unify OTX augmentation pipeline under a PyTorch-first approach.
- Improve training performance by reducing tensor/NumPy conversions.
- Increase maintainability and clarity of augmentation code by eliminating redundant implementations.
- Provide benchmark results to decide between Kornia and torchvision v2.