Skip to content

Export bboxes dataset as VIA-tracks file #580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 75 commits into
base: main
Choose a base branch
from

Conversation

sfmig
Copy link
Contributor

@sfmig sfmig commented May 8, 2025

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
To support exporting bounding box datasets in "VIA-tracks" .csv format.

What does this PR do?

  • Adds a save_bboxes.py module
  • Adds tests
  • Updates the relevant docs (API reference and user guide)

It also:

  • renames _validate_dataset --> validate_pose_dataset for clarity
  • factors out the function _validate_file_path into a new movement.io.utils module, that holds functions shared across io submodules (right now, only this function).
  • adds tests for the _validate_file_path function

This PR used PR #497 as starting point.

References

Closes #495

The issue recommends that after this functionality is implemented, we should extend the following tests (introduced in PR #503) to run on bbox data too:

  • test_dimension_slider_with_nans
  • test_dimension_slider_multiple_files

Since this would increase the size of this PR (which is already quite chunky), I opened a separate issue for this --> issue #591

How has this PR been tested?

Tests pass locally and in CI.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

Yes, it is included.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

Copy link

codecov bot commented May 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (57b5cbd) to head (3c2dbc7).

Additional details and impacted files
@@            Coverage Diff             @@
##              main      #580    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           31        33     +2     
  Lines         1615      1719   +104     
==========================================
+ Hits          1615      1719   +104     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sfmig sfmig force-pushed the smg/save-via-tracks-from-pr497 branch 3 times, most recently from f1bed45 to 24d3e19 Compare May 14, 2025 09:57
harsh-bhanushali-05 and others added 25 commits May 14, 2025 10:57
@sfmig sfmig force-pushed the smg/save-via-tracks-from-pr497 branch from 24d3e19 to 842bbbc Compare May 14, 2025 09:57
Copy link

@sfmig sfmig marked this pull request as ready for review May 14, 2025 17:34
@sfmig sfmig requested a review from niksirbi May 14, 2025 17:34
Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @sfmig!

I'm approving this for your convenience.
I left some comments, many of them optional.

My main two quibbles are:

  • I find the name and description of the extract_track_id_from_individuals arguments confusing (I couldn't understand what exactly it was doing before reading the source code). I've left some suggestions for alternative ways to describe it.
  • Some error messages are not logged

By default the {func}`movement.io.save_bboxes.to_via_tracks_file` function will try to extract the track IDs from the individuals' names, but you can also select to extract them from the sorted list of individuals with `extract_track_id_from_individuals=True`.


Alternatively, you can save the bounding box tracks to a .csv file with a custom header using the standard Python library `csv`. Below is an example of how you can do this:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have the built-in support for saving to VIA, I wonder if it's worth putting this "alternative" custom csv option under a dropdown, in the spirit of reducing the page length

@@ -0,0 +1,117 @@
"""Unit tests for the movement.io.utils module."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how neatly you've structured the tests here.

You may be aware that some of these cases are also covered by the tests in test_validators/test_files_validator, especially this one.

The "duplication" occurs because _validate_file_path is a convenience wrapper around ValidFilePath. I think the partial overlap is fine in this case, but I was wondering what your thoughts are on such cases in general, as best practice suggests "separation of concerns".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you've alse seen the fixtures in fixtures/files.py. Would it make sense to re-use them here? I'm just asking becuase in the early days of movement we had gone through some back and forth on figuring out the correct way to test for permission errors. For example, removing "write" permissions from files resulted in some weird errors down the line (e.g. pytest not being able to delet the tmp files during cleanup). I don't think your implementation will lead to the same error, but it may be safer to rely on existing (and long battle-tested) fixtures.

from movement.validators.files import ValidFile


def _validate_file_path(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very important, but I am wondering whether it makes more conceptual sense to put this function into validators/files.py as it's just a convenience wrapper around ValidFile. Happy for you to make the call.

image_file_prefix: str | None = None,
image_file_suffix: str = ".png",
) -> Path:
"""Save a movement bounding boxes dataset to a VIA tracks .csv file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To achieve a uniform docstring style across I/O functions.

Suggested change
"""Save a movement bounding boxes dataset to a VIA tracks .csv file.
"""Save a ``movement`` bounding boxes dataset to a VIA tracks .csv file.

The movement bounding boxes dataset to export.
file_path : str or pathlib.Path
Path where the VIA tracks .csv file [1]_ will be saved.
extract_track_id_from_individuals : bool, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the name of this argument quite confusing, because track IDs are always derived from indiviuals names in some form, even if this it False.

I struggle to come up with a better name, however. Maybe use_trailing_numbers_as_track_ids? This may make more sence together with my suggestion for this argument's description.

Remember to also update the Input/Output docs page if you change this.

Comment on lines +329 to +340
# Find the first non-digit character starting from the end
last_idx = len(individual) - 1
first_non_digit_idx = last_idx
while (
first_non_digit_idx >= 0
and individual[first_non_digit_idx].isdigit()
):
first_non_digit_idx -= 1

# Extract track ID from (first_non_digit_idx+1) until the end
if first_non_digit_idx < last_idx:
track_id = int(individual[first_non_digit_idx + 1 :])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be much more straightforward to do this with a regex

Suggested change
# Find the first non-digit character starting from the end
last_idx = len(individual) - 1
first_non_digit_idx = last_idx
while (
first_non_digit_idx >= 0
and individual[first_non_digit_idx].isdigit()
):
first_non_digit_idx -= 1
# Extract track ID from (first_non_digit_idx+1) until the end
if first_non_digit_idx < last_idx:
track_id = int(individual[first_non_digit_idx + 1 :])
# Find last sequence of digits in the name
if match := re.search(r'(\d+)$', individual):
track_id = int(match.group(1))

return frame_n_digits


def _get_map_individuals_to_track_ids(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor, but I find _map_individuals_to_track_ids much more natural (I read "map" as a verb).

Parameters
----------
ds : xarray.Dataset
The movement bounding boxes dataset to export.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The movement bounding boxes dataset to export.
The ``movement`` bounding boxes dataset to export.

filenames (including leading zeros). If None, the number of digits is
automatically determined from the largest frame number in the dataset,
plus one (to have at least one leading zero). Default is None.
image_file_prefix : str, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I found a bit confusing is that we require image filenames, even though the movement dataset itself doesn't contain any images. I understand why that is, but maybe we should add a "Note" in the docstring?

if image_file_suffix is not None:
assert df["filename"].str.endswith(image_file_suffix).all()
else:
assert df["filename"].str.endswith(".png").all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understan how this is always .png, given that you also have .jpg in the parametrisation. What am I missing?

@niksirbi
Copy link
Member

niksirbi commented Jun 3, 2025

Note: this PR will need to be rebased and updated after #606 is merged, mostly because the Input/Output guide has been restructured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for exporting bboxes in VIA-tracks
3 participants