pytorch · NicolasHug · Jun 23, 2025 · AntoineSimoulin · Jun 23, 2025 · AntoineSimoulin
diff --git a/docs/source/transforms.rst b/docs/source/transforms.rst
@@ -1,14 +1,20 @@
 .. _transforms:
 
-Transforming and augmenting images
-==================================
+Transforming images, videos, boxes and more
+===========================================
 
 .. currentmodule:: torchvision.transforms
 
 Torchvision supports common computer vision transformations in the
-``torchvision.transforms`` and ``torchvision.transforms.v2`` modules. Transforms
-can be used to transform or augment data for training or inference of different
-tasks (image classification, detection, segmentation, video classification).
+``torchvision.transforms.v2`` module. Transforms can be used to transform and
+augment data, for both training or inference. The following objects are
+supported:
+
+- Images as pure tensors, :class:`~torchvision.tv_tensors.Image` or PIL image
+- Videos as :class:`~torchvision.tv_tensors.Video` 
+- Aligned and rotated bounding boxes as :class:`~torchvision.tv_tensors.BoundingBoxes` 
+- Segmentation and detection masks as :class:`~torchvision.tv_tensors.Mask` 
+- KeyPoints as :class:`~torchvision.tv_tensors.KeyPoints`.
 
 .. code:: python
 
@@ -111,9 +117,9 @@ In Torchvision 0.15 (March 2023), we released a new set of transforms available
 in the ``torchvision.transforms.v2`` namespace. These transforms have a lot of
 advantages compared to the v1 ones (in ``torchvision.transforms``):
 
-- They can transform images **but also** bounding boxes, masks, or videos. This
-  provides support for tasks beyond image classification: detection, segmentation,
-  video classification, etc. See
+- They can transform images **but also** bounding boxes, masks, videos and
+  keypoints. This provides support for tasks beyond image classification:
+  detection, segmentation, video classification, etc. See
   :ref:`sphx_glr_auto_examples_transforms_plot_transforms_getting_started.py`
   and :ref:`sphx_glr_auto_examples_transforms_plot_transforms_e2e.py`.
 - They support more transforms like :class:`~torchvision.transforms.v2.CutMix`

diff --git a/gallery/transforms/plot_transforms_getting_started.py b/gallery/transforms/plot_transforms_getting_started.py
@@ -79,12 +79,12 @@
 #     very easy: the v2 transforms are fully compatible with the v1 API, so you
 #     only need to change the import!
 #
-# Detection, Segmentation, Videos
+# Videos, boxes, masks, keypoints
 # -------------------------------
 #
-# The new Torchvision transforms in the ``torchvision.transforms.v2`` namespace
-# support tasks beyond image classification: they can also transform bounding
-# boxes, segmentation / detection masks, or videos.
+# The Torchvision transforms in the ``torchvision.transforms.v2`` namespace
+# support tasks beyond image classification: they can also transform rotated or
+# aligned bounding boxes, segmentation / detection masks, videos, and keypoints.
 #
 # Let's briefly look at a detection example with bounding boxes.
 
@@ -129,8 +129,9 @@
 # TVTensors are :class:`torch.Tensor` subclasses. The available TVTensors are
 # :class:`~torchvision.tv_tensors.Image`,
 # :class:`~torchvision.tv_tensors.BoundingBoxes`,
-# :class:`~torchvision.tv_tensors.Mask`, and
-# :class:`~torchvision.tv_tensors.Video`.
+# :class:`~torchvision.tv_tensors.Mask`,
+# :class:`~torchvision.tv_tensors.Video`, and
+# :class:`~torchvision.tv_tensors.KeyPoints`.
 #
 # TVTensors look and feel just like regular tensors - they **are** tensors.
 # Everything that is supported on a plain :class:`torch.Tensor` like ``.sum()``

diff --git a/torchvision/tv_tensors/_keypoints.py b/torchvision/tv_tensors/_keypoints.py
@@ -13,19 +13,14 @@ class KeyPoints(TVTensor):
 
     Each point is represented by its X and Y coordinates along the width and height dimensions, respectively.
 
-    KeyPoints can be converted from :class:`torchvision.tv_tensors.BoundingBoxes`
-    by :func:`torchvision.transforms.v2.functional.convert_bounding_boxes_to_points`.
-
     KeyPoints may represent any object that can be represented by sequences of 2D points:
 
     - `Polygonal chains <https://en.wikipedia.org/wiki/Polygonal_chain>`_,
-      including polylines, Bézier curves, etc., which should be of shape
-      ``[N_chains, N_points, 2]``, which is equal to ``[N_chains, N_segments +
-      1, 2]``
-    - Polygons, which should be of shape ``[N_polygons, N_points, 2]``, which is
-      equal to ``[N_polygons, N_sides, 2]``
-    - Skeletons, which could be of shape ``[N_skeletons, N_bones, 2, 2]`` for
-      pose-estimation models
+      including polylines, Bézier curves, etc., which can be of shape
+      ``[N_chains, N_points, 2]``
+    - Polygons, which can be of shape ``[N_polygons, N_points, 2]``
+    - Skeletons, which can be of shape ``[N_skeletons, N_bones, 2, 2]`` for
+      pose-estimation models.
 
     .. note::
         Like for :class:`torchvision.tv_tensors.BoundingBoxes`, there should