Skip to content

Adjust clamping for rotated bboxes #9112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

AntoineSimoulin
Copy link
Member

@AntoineSimoulin AntoineSimoulin commented Jun 20, 2025

Adjust clamping for Rotated Boxes

This PR is a follow-up to #9104, aiming to address inconsistencies in the clamping function and improve its intuitiveness. The initial approach for clamping rotated bounding boxes focused on finding the largest angle-preserving box enclosed within the original box and the image canvas. However, as illustrated in Figure 2, this method can lead to non-intuitive results where the box does not fully enclose the underlying object. To address this issue, this PR proposes an adjustment to the clamping function. Instead of seeking the largest angle-preserving box, we now aim to find the smallest angle-preserving box that encloses the intersection of the original box and the image canvas. This change ensures that the resulting box is more intuitive.

These adjustments have some key implications. With this new approach, clamped rotated boxes may have vertices outside the canvas. However, the center of the bounding box is guaranteed to remain within the canvas. This PR addresses #8254 by ensuring that rotated bounding boxes SHOULD be clamped (consistent with un-rotated boxes). Crucially, as illustrated in Figure 1, the clamping operation preserves the original box's pixel assignments within the image canvas, ensuring that no information is lost during the process.

Details of the adjustments

This PR implements in particular the following modifications:

  • Modify the conditions from the clamping function to ensure the resulting box completely encapsulate the input box. The output from the clamping operation is the smallest angle-preserving box that encloses the intersection of the original box and the image canvas.
  • Modify the elastic_bounding_boxes for rotated boxes so that we use the "CXCYWHR" format instead of "XYXYXYXY". The elastic transform needs the transformed points to be within the canvas size. This is the case for the center or rotated boxes but not necessarily for all vertices.
  • Fix the _order_bounding_boxes_points in the case of largest negative values along the y-axis.

Illustration of the adjustements

We illustrate the adjustments on the clamping function using this image example. The clamping should be more intuitive and should prevent from loosing information.

image

Figure 1: Illustration of the clamping adjustments (original box in grey and corresponding clamped box in blue).

image

Figure 2: Illustration of the clamping BEFORE this PR.

image

Figure 3: Illustration of the clamping AFTER this PR.

Test plan

Please run the following tests:

pytest test/test_transforms_v2.py -k box -v
...
2372 passed, 1432 skipped, 5025 deselected in 46.08s

Test Plan:
```bash
pytest test/test_transforms_v2.py -k box -v
```
Copy link

pytorch-bot bot commented Jun 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9112

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 5b2af7c with merge base 6aee5ed (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@NicolasHug
Copy link
Member

Thanks for the PR @AntoineSimoulin, and for the detailed pictures!

It's clear from Figure 2 that our current clamping strategy leads to sub-optimal boxes. Out of curiosity, could you share the transformations that were used in each result? I suspect that the more transforms are used, the more clamping happens, and thus more information is lost.

The clamping strategy proposed in this PR allows for some corners of the box to be outside of the image canvas. That makes me wonder: what do we actually want from a clamping operation? Do we want the corners to be within the canvas, or do we only need the center of the box to be within the canvas?

My current understanding is that there is a spectrum of clamping strategies:

  • no clamp at all. This is what retains the most information.
  • a strict clamping, where we force all of the box points to be in the canvas, as implemented in main. Potentially, a lot of information is lost.
  • a more lenient clamping as in this PR, which seems to be an intermediate strategy between the 2 strategies above: we lose less information than with strict clamping, but we may still have points outside of the canvas.

I do agree that the clamping in this PR leads to less surprising results than the strict clamping we have in main. Maybe we could expose it as one of multiple clamping strategies. However, since it still results in points outside the canvas and some information loss, I wonder if users wouldn't prefer the no-clamping strategy in general?

@AntoineSimoulin
Copy link
Member Author

Out of curiosity, could you share the transformations that were used in each result?

Figure 2 and 3 are obtained by applying a CenterCrop transformation for size in 300, 500, 1000, and original image size.

My current understanding is that there is a spectrum of clamping strategies

Yeah I do agree with the proposed breakdown.

Maybe we could expose it as one of multiple clamping strategies. However, since it still results in points outside the canvas and some information loss, I wonder if users wouldn't prefer the no-clamping strategy in general?

As illustrated in Figure 1, I feel the strategy proposed in this PR offers the best trade-off. For instance, in the case of object detection, it would be very difficult for a model to predict a vertex very far from the canvas boundaries. Also this transformation ensures that the center of the box is within the image canvas and therefore we should be able to apply any transformation without error. Finally, Contrary to stricter clamping we do not loose information as all pixels within the canvas assigned to the object are still within the bounding box.

I would prefer to opt-in by default for this strategy and do not keep implementation for other option for now to keep simplicity of the codebase. Let me know what you think!

@AntoineSimoulin
Copy link
Member Author

Adjust the current PR to offer both SOFT and HARD clamping, controlled by a "clamping_mode" parameter for rotated boxes. Both clamping strategies are detailed in the first commit message of this PR. The last commits include the following modifications:

  • Not applying clamping for the TestHorizontalFlip and TestVerticalFlip reference in the tests as we do not apply clamping in the kernels;
  • Adjusting the TestRotate test. For this test there was a discrepancy in the predicted canvas size with a absolute tolerance of up to 2. Since could result in slightly different bounding boxes as actual and reference would be clamped within two different canvas. We apply the test in two steps. First control the canvas size, then control the transformation with the same canvas size;
  • Increase tolerance for TestResizedCrop to up to 1e-5;
  • In _clamp_rotated_bounding_boxes add small epsilon of value 1e-06 to ensure consistency between CPU and GPU tests.
  • Adjust the hard clamping to avoid clamping too much. When translating each vertex, control if the diagonal vertices can be adjust within the canvas and the original rotated bounding box to output the largest clamped box possible.

We illustrate both clamping modes below.

image
Figure 1: Illustration of the HARD clamping after applying a CenterCrop transformation for size in 300, 500, 1000 to the original image.

image
Figure 2: Illustration of the SOFT clamping after applying a CenterCrop transformation for size in 300, 500, 1000 to the original image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants