Skip to content

Conversation

scottclowe
Copy link
Member

We need to support this because the value returned by the dataset for missing values in all categorical columns is NaN, even for columns that contain taxonomic strings.

We need to support this because the value returned by the dataset
for missing values in all categorical columns is NaN, even for
columns that contain taxonomic strings.
@scottclowe scottclowe added the bug Something isn't working label Apr 19, 2025
@scottclowe scottclowe requested a review from Copilot April 19, 2025 13:09
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in the label2index function by mapping missing values (either empty strings or NaN values) to -1, ensuring consistent handling of missing categorical labels.

  • Update docstrings to describe the new behavior.
  • Modify conditionals to check for NaN values using pandas.isna in both bioscan5m.py and bioscan1m.py.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
bioscan_dataset/bioscan5m.py Updated label2index to map NaN values to -1
bioscan_dataset/bioscan1m.py Updated label2index to map NaN values to -1
Comments suppressed due to low confidence (2)

bioscan_dataset/bioscan5m.py:615

  • Consider adding unit tests to verify that NaN values are correctly mapped to -1 in label2index.
Entries containing missing values, indicated by empty strings or NaN values,

bioscan_dataset/bioscan1m.py:894

  • Consider adding tests to confirm the NaN mapping behavior to -1 in label2index.
Entries containing missing values, indicated by empty strings or NaN values,

@scottclowe scottclowe merged commit eea5e03 into master Apr 19, 2025
5 checks passed
@scottclowe scottclowe deleted the bug_missing-taxons-nan branch April 19, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant