You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+21-20Lines changed: 21 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,8 +19,9 @@ In this package, we provide PyTorch/torchvision style dataset classes to load th
19
19
BIOSCAN-1M and 5M are large multimodal datasets for insect biodiversity monitoring, containing over 1 million and 5 million specimens, respectively.
20
20
The datasets are comprised of RGB microscopy images, DNA barcodes, and fine-grained, hierarchical taxonomic labels.
21
21
Every sample has both an image and a DNA barcode, but the taxonomic labels are incomplete and only extend all the way to the species level for around 9% of the specimens.
22
+
For more details about the datasets, please see the `BIOSCAN-1M paper <BS1M-paper_>`_ and `BIOSCAN-5M paper <BS5M-paper_>`_, respectively.
22
23
23
-
Documentation, including the full API details, is available online at readthedocs_.
24
+
Documentation about this package, including the full API details, is available online at readthedocs_.
24
25
25
26
26
27
Installation
@@ -38,14 +39,14 @@ To install the package, run:
38
39
Usage
39
40
-----
40
41
41
-
The datasets can be used in the same way as PyTorch's torchvision datasets.
42
+
The datasets can be used in the same way as PyTorch's `torchvision datasets<https://pytorch.org/vision/main/datasets.html#built-in-datasets_>`_.
To use a different image package, follow the download instructions given in the `BIOSCAN-5M repository <https://github.com/bioscan-ml/BIOSCAN-5M?tab=readme-ov-file#dataset-access>`_, then set the argument ``image_package`` to the desired package name, e.g.
In the BIOSCAN-5M dataset, the dataset is partitioned so there are ``train``, ``val``, and ``test`` splits to use for closed-world tasks (seen species), and ``key_unseen``, ``val_unseen``, and ``test_unseen`` splits to use for open-world tasks (unseen species).
113
112
These partitions only use samples labelled to species-level.
@@ -150,34 +149,34 @@ This can be changed by setting the argument ``input_modality`` to either ``"imag
By default, the target values will be provided as integer indices that map to the labels for that taxonomic rank (with value ``-1`` used for missing labels), appropriate for training a classification model with cross-entropy.
@@ -188,13 +187,13 @@ If this is set to ``target_format="text"``, the output will instead be the raw l
0 commit comments