Skip to content

Optimize OTX Data Augmentations pipeline #5020

@kprokofi

Description

@kprokofi

📄 Description

Current Problem:

  1. Mixed augmentation pipelines: OTX currently uses a combination of torchvision and OpenMMLab augmentations (some self-implemented in the repo). Later in the pipeline, operations rely on NumPy images, while torchvision expects PyTorch tensors. This causes constant conversion between tensors and NumPy arrays, introducing a performance bottleneck that slows down training.

  2. Inconsistent interfaces: Parameter names, formats, and APIs differ across the various augmentation implementations, making the codebase harder to maintain and extend.

  3. Redundant self-implemented augmentations: Several augmentations are re-implemented locally in OTX, even though robust third-party solutions exist and are actively maintained.


Proposed Solution:

Evaluate Kornia as the primary augmentation library for OTX. Kornia is a differentiable computer vision library built on top of PyTorch, with an extensive set of augmentation operators.

Key benefits of Kornia:

  • 🟢 PyTorch-first design: All augmentations operate directly on PyTorch tensors, removing unnecessary conversions from/to NumPy and improving training efficiency.
  • 🟢 Rich functionality: Provides a wide range of augmentations, including geometric, color, and intensity transformations, as well as advanced ones (e.g., motion blur, random perspective, cutout).
  • 🟢 Differentiable & GPU-accelerated: Augmentations are differentiable, making them suitable for integration with modern deep learning pipelines, and can be executed efficiently on GPUs.
  • 🟢 Unified API & consistency: Standardized function signatures simplify maintainability and reduce code complexity.
  • 🟢 Extended support: Handles not only images but also masks, bounding boxes, and keypoints, aligning with the requirements of detection and segmentation tasks.

Benchmarking Plan (Kornia vs torchvision.v2):

  • Compare throughput (images/sec) during training with augmentation-heavy pipelines.
  • Measure GPU utilization and memory footprint when using Kornia vs torchvision v2.
  • Evaluate coverage of transformations (ensure Kornia includes all augmentations currently used in OTX).
  • Validate annotation consistency (masks, bounding boxes, keypoints remain synchronized after augmentation).

🎯 Objective

  • Unify OTX augmentation pipeline under a PyTorch-first approach.
  • Improve training performance by reducing tensor/NumPy conversions.
  • Increase maintainability and clarity of augmentation code by eliminating redundant implementations.
  • Provide benchmark results to decide between Kornia and torchvision v2.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions