diff --git a/docs/source/transforms.rst b/docs/source/transforms.rst index 4b00fab023d..22bddae92a0 100644 --- a/docs/source/transforms.rst +++ b/docs/source/transforms.rst @@ -1,14 +1,20 @@ .. _transforms: -Transforming and augmenting images -================================== +Transforming images, videos, boxes and more +=========================================== .. currentmodule:: torchvision.transforms Torchvision supports common computer vision transformations in the -``torchvision.transforms`` and ``torchvision.transforms.v2`` modules. Transforms -can be used to transform or augment data for training or inference of different -tasks (image classification, detection, segmentation, video classification). +``torchvision.transforms.v2`` module. Transforms can be used to transform and +augment data, for both training or inference. The following objects are +supported: + +- Images as pure tensors, :class:`~torchvision.tv_tensors.Image` or PIL image +- Videos as :class:`~torchvision.tv_tensors.Video` +- Aligned and rotated bounding boxes as :class:`~torchvision.tv_tensors.BoundingBoxes` +- Segmentation and detection masks as :class:`~torchvision.tv_tensors.Mask` +- KeyPoints as :class:`~torchvision.tv_tensors.KeyPoints`. .. code:: python @@ -111,9 +117,9 @@ In Torchvision 0.15 (March 2023), we released a new set of transforms available in the ``torchvision.transforms.v2`` namespace. These transforms have a lot of advantages compared to the v1 ones (in ``torchvision.transforms``): -- They can transform images **but also** bounding boxes, masks, or videos. This - provides support for tasks beyond image classification: detection, segmentation, - video classification, etc. See +- They can transform images **but also** bounding boxes, masks, videos and + keypoints. This provides support for tasks beyond image classification: + detection, segmentation, video classification, etc. See :ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py` and :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`. - They support more transforms like :class:`~torchvision.transforms.v2.CutMix` diff --git a/gallery/transforms/plot_transforms_getting_started.py b/gallery/transforms/plot_transforms_getting_started.py index 2696a9e57e7..08e012843f4 100644 --- a/gallery/transforms/plot_transforms_getting_started.py +++ b/gallery/transforms/plot_transforms_getting_started.py @@ -79,12 +79,12 @@ # very easy: the v2 transforms are fully compatible with the v1 API, so you # only need to change the import! # -# Detection, Segmentation, Videos +# Videos, boxes, masks, keypoints # ------------------------------- # -# The new Torchvision transforms in the ``torchvision.transforms.v2`` namespace -# support tasks beyond image classification: they can also transform bounding -# boxes, segmentation / detection masks, or videos. +# The Torchvision transforms in the ``torchvision.transforms.v2`` namespace +# support tasks beyond image classification: they can also transform rotated or +# aligned bounding boxes, segmentation / detection masks, videos, and keypoints. # # Let's briefly look at a detection example with bounding boxes. @@ -129,8 +129,9 @@ # TVTensors are :class:`torch.Tensor` subclasses. The available TVTensors are # :class:`~torchvision.tv_tensors.Image`, # :class:`~torchvision.tv_tensors.BoundingBoxes`, -# :class:`~torchvision.tv_tensors.Mask`, and -# :class:`~torchvision.tv_tensors.Video`. +# :class:`~torchvision.tv_tensors.Mask`, +# :class:`~torchvision.tv_tensors.Video`, and +# :class:`~torchvision.tv_tensors.KeyPoints`. # # TVTensors look and feel just like regular tensors - they **are** tensors. # Everything that is supported on a plain :class:`torch.Tensor` like ``.sum()`` diff --git a/torchvision/tv_tensors/_keypoints.py b/torchvision/tv_tensors/_keypoints.py index cb4163be20d..88661302ffe 100644 --- a/torchvision/tv_tensors/_keypoints.py +++ b/torchvision/tv_tensors/_keypoints.py @@ -13,19 +13,14 @@ class KeyPoints(TVTensor): Each point is represented by its X and Y coordinates along the width and height dimensions, respectively. - KeyPoints can be converted from :class:`torchvision.tv_tensors.BoundingBoxes` - by :func:`torchvision.transforms.v2.functional.convert_bounding_boxes_to_points`. - KeyPoints may represent any object that can be represented by sequences of 2D points: - `Polygonal chains `_, - including polylines, Bézier curves, etc., which should be of shape - ``[N_chains, N_points, 2]``, which is equal to ``[N_chains, N_segments + - 1, 2]`` - - Polygons, which should be of shape ``[N_polygons, N_points, 2]``, which is - equal to ``[N_polygons, N_sides, 2]`` - - Skeletons, which could be of shape ``[N_skeletons, N_bones, 2, 2]`` for - pose-estimation models + including polylines, Bézier curves, etc., which can be of shape + ``[N_chains, N_points, 2]`` + - Polygons, which can be of shape ``[N_polygons, N_points, 2]`` + - Skeletons, which can be of shape ``[N_skeletons, N_bones, 2, 2]`` for + pose-estimation models. .. note:: Like for :class:`torchvision.tv_tensors.BoundingBoxes`, there should