Skip to content

Update docs to mention rotated boxes and keypoints #9113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions docs/source/transforms.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,20 @@
.. _transforms:

Transforming and augmenting images
==================================
Transforming images, videos, boxes and more
===========================================

.. currentmodule:: torchvision.transforms

Torchvision supports common computer vision transformations in the
``torchvision.transforms`` and ``torchvision.transforms.v2`` modules. Transforms
can be used to transform or augment data for training or inference of different
tasks (image classification, detection, segmentation, video classification).
``torchvision.transforms.v2`` module. Transforms can be used to transform and
augment data, for both training or inference. The following objects are
supported:

- Images as pure tensors, :class:`~torchvision.tv_tensors.Image` or PIL image
- Videos as :class:`~torchvision.tv_tensors.Video`
- Aligned and rotated bounding boxes as :class:`~torchvision.tv_tensors.BoundingBoxes`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Axis-aligned" to be a bit more specific?

- Segmentation and detection masks as :class:`~torchvision.tv_tensors.Mask`
- KeyPoints as :class:`~torchvision.tv_tensors.KeyPoints`.

.. code:: python

Expand Down Expand Up @@ -111,9 +117,9 @@ In Torchvision 0.15 (March 2023), we released a new set of transforms available
in the ``torchvision.transforms.v2`` namespace. These transforms have a lot of
advantages compared to the v1 ones (in ``torchvision.transforms``):

- They can transform images **but also** bounding boxes, masks, or videos. This
provides support for tasks beyond image classification: detection, segmentation,
video classification, etc. See
- They can transform images **but also** bounding boxes, masks, videos and
keypoints. This provides support for tasks beyond image classification:
detection, segmentation, video classification, etc. See
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add "pose estimation"?

:ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py`
and :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`.
- They support more transforms like :class:`~torchvision.transforms.v2.CutMix`
Expand Down
13 changes: 7 additions & 6 deletions gallery/transforms/plot_transforms_getting_started.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,12 @@
# very easy: the v2 transforms are fully compatible with the v1 API, so you
# only need to change the import!
#
# Detection, Segmentation, Videos
# Videos, boxes, masks, keypoints
# -------------------------------
#
# The new Torchvision transforms in the ``torchvision.transforms.v2`` namespace
# support tasks beyond image classification: they can also transform bounding
# boxes, segmentation / detection masks, or videos.
# The Torchvision transforms in the ``torchvision.transforms.v2`` namespace
# support tasks beyond image classification: they can also transform rotated or
# aligned bounding boxes, segmentation / detection masks, videos, and keypoints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "axis-aligned"?

#
# Let's briefly look at a detection example with bounding boxes.

Expand Down Expand Up @@ -129,8 +129,9 @@
# TVTensors are :class:`torch.Tensor` subclasses. The available TVTensors are
# :class:`~torchvision.tv_tensors.Image`,
# :class:`~torchvision.tv_tensors.BoundingBoxes`,
# :class:`~torchvision.tv_tensors.Mask`, and
# :class:`~torchvision.tv_tensors.Video`.
# :class:`~torchvision.tv_tensors.Mask`,
# :class:`~torchvision.tv_tensors.Video`, and
# :class:`~torchvision.tv_tensors.KeyPoints`.
#
# TVTensors look and feel just like regular tensors - they **are** tensors.
# Everything that is supported on a plain :class:`torch.Tensor` like ``.sum()``
Expand Down
15 changes: 5 additions & 10 deletions torchvision/tv_tensors/_keypoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,14 @@ class KeyPoints(TVTensor):

Each point is represented by its X and Y coordinates along the width and height dimensions, respectively.

KeyPoints can be converted from :class:`torchvision.tv_tensors.BoundingBoxes`
by :func:`torchvision.transforms.v2.functional.convert_bounding_boxes_to_points`.

KeyPoints may represent any object that can be represented by sequences of 2D points:

- `Polygonal chains <https://en.wikipedia.org/wiki/Polygonal_chain>`_,
including polylines, Bézier curves, etc., which should be of shape
``[N_chains, N_points, 2]``, which is equal to ``[N_chains, N_segments +
1, 2]``
- Polygons, which should be of shape ``[N_polygons, N_points, 2]``, which is
equal to ``[N_polygons, N_sides, 2]``
- Skeletons, which could be of shape ``[N_skeletons, N_bones, 2, 2]`` for
pose-estimation models
including polylines, Bézier curves, etc., which can be of shape
``[N_chains, N_points, 2]``
- Polygons, which can be of shape ``[N_polygons, N_points, 2]``
- Skeletons, which can be of shape ``[N_skeletons, N_bones, 2, 2]`` for
pose-estimation models.

.. note::
Like for :class:`torchvision.tv_tensors.BoundingBoxes`, there should
Expand Down
Loading