Skip to content

Rename source format #525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Udayscode
Copy link

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

This PR addresses issue #422, which requested renaming the source_software attribute to source_format for better clarity and consistency in the movement package.

What does this PR do?

This PR renames source_software to source_format across the codebase, including all relevant Python files, test files, and documentation. It updates function arguments, attributes, and references to ensure consistency.

References

How has this PR been tested?

I set up the project locally using Conda as per the contribution guidelines. Initially, running pytest --cov=movement showed 630 passed and 70 failed tests. After updating the tests to reflect the source_software to source_format change, the results improved to 682 passed and 18 failed. The remaining failures include:

  • 14 failures due to numpy.linalg.LinAlgError in filtering tests (unrelated to my changes).
  • 2 failures due to KeyError: 'source_format' in test_sample_data.py (minor issue, possibly needs a small tweak).

Is this a breaking change?

No, this PR should not break existing functionality—it’s a rename of an attribute and its references. Downstream code using source_software will need to update to source_format.

Does this PR require an update to the documentation?

Yes, I’ve updated the documentation in the docs folder to replace source_software with source_format.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

Copy link

@Udayscode Udayscode requested a review from niksirbi March 28, 2025 19:30
Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking an interest in contributing to movement @Udayscode — we really appreciate it.

Regarding the test failures:

  • I can’t reproduce the linalg test failures locally. If you're seeing them on the main branch as well, it might be worth opening an issue and posting the full error message so we can investigate further.
  • The test_sample_data failures are happening because the field is still named source_software in the metadata file for our sample datasets. That’s not your fault — you don’t have write access to our data repository. I could update the metadata myself, but we’ll need to discuss it with the rest of the team first, since that change would affect CI across all open PRs.

For now, you can apply the two suggestions I left in the review to get the tests passing. We’ll revisit the metadata issue once we’ve resolved the other areas of concern.

That brings me to the broader changes needed for this PR. Right now, the edits don’t go far enough — this requires more than a basic search-and-replace. Here’s what still needs to be addressed:

  • Please search for all mentions of the word “software” across the codebase. This includes docstrings and the documentation. The surrounding sentences often need rephrasing to reflect the change in terminology — it's not just a matter of renaming.
  • We no longer need "LightningPose" as a source_format option, and the corresponding functions can be removed:
    • load_poses.from_lp_file()
    • _ds_from_lp_or_dlc_file() — its logic should be merged into load_poses.from_dlc_file()

The rationale here is that while LightningPose and DeepLabCut are different software packages, they both use the same underlying file format — the “DeepLabCut” format. Since we’re now making source format, rather than source software, the primary categorisation, there’s no need to treat LightningPose as a separate case. The docs/source/input_output.md file must be also updated to reflect this shift.

Let me know if any of that is unclear.

@@ -295,7 +295,7 @@ def fetch_dataset(
if file_paths.get(key):
ds = load_module.from_file(
file_paths[key],
source_software=metadata[filename]["source_software"],
source_format=metadata[filename]["source_format"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make the test_sample_data tests pass, until we change the name of this field to "source_format" also on our data repository.

Suggested change
source_format=metadata[filename]["source_format"],
source_format=metadata[filename]["source_software"],

@@ -44,7 +44,7 @@ def validate_metadata(metadata: dict[str, dict]) -> None:
metadata_fields = [
"sha256sum",
"type",
"source_software",
"source_format",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be changed back to source_software to make the tests pass (until we update the corresponding field in the sample data repository).

Suggested change
"source_format",
"source_software",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants