-
Notifications
You must be signed in to change notification settings - Fork 63
Export bboxes dataset as VIA-tracks file #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #580 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 31 33 +2
Lines 1615 1719 +104
==========================================
+ Hits 1615 1719 +104 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f1bed45
to
24d3e19
Compare
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
24d3e19
to
842bbbc
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good @sfmig!
I'm approving this for your convenience.
I left some comments, many of them optional.
My main two quibbles are:
- I find the name and description of the
extract_track_id_from_individuals
arguments confusing (I couldn't understand what exactly it was doing before reading the source code). I've left some suggestions for alternative ways to describe it. - Some error messages are not logged
By default the {func}`movement.io.save_bboxes.to_via_tracks_file` function will try to extract the track IDs from the individuals' names, but you can also select to extract them from the sorted list of individuals with `extract_track_id_from_individuals=True`. | ||
|
||
|
||
Alternatively, you can save the bounding box tracks to a .csv file with a custom header using the standard Python library `csv`. Below is an example of how you can do this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have the built-in support for saving to VIA, I wonder if it's worth putting this "alternative" custom csv option under a dropdown, in the spirit of reducing the page length
@@ -0,0 +1,117 @@ | |||
"""Unit tests for the movement.io.utils module.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how neatly you've structured the tests here.
You may be aware that some of these cases are also covered by the tests in test_validators/test_files_validator
, especially this one.
The "duplication" occurs because _validate_file_path
is a convenience wrapper around ValidFilePath
. I think the partial overlap is fine in this case, but I was wondering what your thoughts are on such cases in general, as best practice suggests "separation of concerns".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you've alse seen the fixtures in fixtures/files.py
. Would it make sense to re-use them here? I'm just asking becuase in the early days of movement
we had gone through some back and forth on figuring out the correct way to test for permission errors. For example, removing "write" permissions from files resulted in some weird errors down the line (e.g. pytest not being able to delet the tmp files during cleanup). I don't think your implementation will lead to the same error, but it may be safer to rely on existing (and long battle-tested) fixtures.
from movement.validators.files import ValidFile | ||
|
||
|
||
def _validate_file_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very important, but I am wondering whether it makes more conceptual sense to put this function into validators/files.py
as it's just a convenience wrapper around ValidFile
. Happy for you to make the call.
image_file_prefix: str | None = None, | ||
image_file_suffix: str = ".png", | ||
) -> Path: | ||
"""Save a movement bounding boxes dataset to a VIA tracks .csv file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To achieve a uniform docstring style across I/O functions.
"""Save a movement bounding boxes dataset to a VIA tracks .csv file. | |
"""Save a ``movement`` bounding boxes dataset to a VIA tracks .csv file. |
The movement bounding boxes dataset to export. | ||
file_path : str or pathlib.Path | ||
Path where the VIA tracks .csv file [1]_ will be saved. | ||
extract_track_id_from_individuals : bool, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the name of this argument quite confusing, because track IDs are always derived from indiviuals names in some form, even if this it False.
I struggle to come up with a better name, however. Maybe use_trailing_numbers_as_track_ids
? This may make more sence together with my suggestion for this argument's description.
Remember to also update the Input/Output docs page if you change this.
# Find the first non-digit character starting from the end | ||
last_idx = len(individual) - 1 | ||
first_non_digit_idx = last_idx | ||
while ( | ||
first_non_digit_idx >= 0 | ||
and individual[first_non_digit_idx].isdigit() | ||
): | ||
first_non_digit_idx -= 1 | ||
|
||
# Extract track ID from (first_non_digit_idx+1) until the end | ||
if first_non_digit_idx < last_idx: | ||
track_id = int(individual[first_non_digit_idx + 1 :]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be much more straightforward to do this with a regex
# Find the first non-digit character starting from the end | |
last_idx = len(individual) - 1 | |
first_non_digit_idx = last_idx | |
while ( | |
first_non_digit_idx >= 0 | |
and individual[first_non_digit_idx].isdigit() | |
): | |
first_non_digit_idx -= 1 | |
# Extract track ID from (first_non_digit_idx+1) until the end | |
if first_non_digit_idx < last_idx: | |
track_id = int(individual[first_non_digit_idx + 1 :]) | |
# Find last sequence of digits in the name | |
if match := re.search(r'(\d+)$', individual): | |
track_id = int(match.group(1)) |
return frame_n_digits | ||
|
||
|
||
def _get_map_individuals_to_track_ids( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor, but I find _map_individuals_to_track_ids
much more natural (I read "map" as a verb).
Parameters | ||
---------- | ||
ds : xarray.Dataset | ||
The movement bounding boxes dataset to export. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The movement bounding boxes dataset to export. | |
The ``movement`` bounding boxes dataset to export. |
filenames (including leading zeros). If None, the number of digits is | ||
automatically determined from the largest frame number in the dataset, | ||
plus one (to have at least one leading zero). Default is None. | ||
image_file_prefix : str, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I found a bit confusing is that we require image filenames, even though the movement dataset itself doesn't contain any images. I understand why that is, but maybe we should add a "Note" in the docstring?
if image_file_suffix is not None: | ||
assert df["filename"].str.endswith(image_file_suffix).all() | ||
else: | ||
assert df["filename"].str.endswith(".png").all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understan how this is always .png, given that you also have .jpg in the parametrisation. What am I missing?
Note: this PR will need to be rebased and updated after #606 is merged, mostly because the Input/Output guide has been restructured. |
Description
What is this PR
Why is this PR needed?
To support exporting bounding box datasets in "VIA-tracks" .csv format.
What does this PR do?
save_bboxes.py
moduleIt also:
_validate_dataset
-->validate_pose_dataset
for clarity_validate_file_path
into a newmovement.io.utils
module, that holds functions shared across io submodules (right now, only this function)._validate_file_path
functionThis PR used PR #497 as starting point.
References
Closes #495
The issue recommends that after this functionality is implemented, we should extend the following tests (introduced in PR #503) to run on bbox data too:
test_dimension_slider_with_nans
test_dimension_slider_multiple_files
Since this would increase the size of this PR (which is already quite chunky), I opened a separate issue for this --> issue #591
How has this PR been tested?
Tests pass locally and in CI.
Is this a breaking change?
No.
Does this PR require an update to the documentation?
Yes, it is included.
Checklist: