You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Updated README with details about the Canadian Invertebrate 1.5M dataset.
- Added column aliases to ensure compatibility with the previous metadata structure.
Copy file name to clipboardExpand all lines: README.rst
+37-4Lines changed: 37 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,12 +12,13 @@
12
12
BIOSCAN Datasets for PyTorch
13
13
============================
14
14
15
-
In this package, we provide PyTorch/torchvision style dataset classes to load the `BIOSCAN-1M <BIOSCAN-1M paper_>`_ and `BIOSCAN-5M <BIOSCAN-5M paper_>`_ datasets.
15
+
In this package, we provide PyTorch/torchvision style dataset classes to load the `BIOSCAN-1M <BIOSCAN-1M paper_>`_, `BIOSCAN-5M <BIOSCAN-5M paper_>`_, and `Canadian Invertebrates 1.5M <CanadianInvertebrates paper_>`_ datasets.
16
16
17
17
BIOSCAN-1M and 5M are large multimodal datasets for insect biodiversity monitoring, containing over 1 million and 5 million specimens, respectively.
18
18
The datasets are comprised of RGB microscopy images, `DNA barcodes <what-is-DNA-barcoding_>`_, and fine-grained, hierarchical taxonomic labels.
19
+
The Canadian Invertebrates 1.5M dataset provides DNA barcodes for over 1.5 million Invertebrates that are collected across 23 ecozones in Canada. It is a major reference library for biodiversity research and consists of DNA Barcodes collected from platforms like BOLD, GenBank and GBIF.
19
20
Every sample has both an image and a DNA barcode, but the taxonomic labels are incomplete and only extend all the way to the species level for around 9% of the specimens.
20
-
For more details about the datasets, please see the `BIOSCAN-1M paper`_ and `BIOSCAN-5M paper`_, respectively.
21
+
For more details about the datasets, please see the `BIOSCAN-1M paper`_ , `BIOSCAN-5M paper`_, and `Canadian Invertebrates 1.5M <CanadianInvertebrates paper_>`_ respectively.
21
22
22
23
Documentation about this package, including the full API details, is available online at readthedocs_.
23
24
@@ -69,6 +70,18 @@ To load the BIOSCAN-1M dataset:
69
70
# Do something with the image, dna_barcode, and label
Note that although BIOSCAN-5M is a superset of BIOSCAN-1M, the repeated data samples are not identical between the two due to data cleaning and processing differences.
73
86
For details, please see Appendix Q of the `BIOSCAN-5M paper`_.
74
87
Additionally, note that the splits are incompatible between the two datasets.
@@ -341,7 +354,7 @@ The transform indicates the name of a taxonomic rank and its value for every ran
341
354
Other resources
342
355
---------------
343
356
344
-
- Read the `BIOSCAN-1M paper`_ and `BIOSCAN-5M paper`_.
357
+
- Read the `BIOSCAN-1M paper`_ , `BIOSCAN-5M paper`_ and `Canadian Invertebrates 1.5M <CanadianInvertebrates paper_>`_.
345
358
- The dataset can be explored through a web interface using our `BIOSCAN Browser`_.
346
359
- Read more about the `International Barcode of Life (iBOL) <https://ibol.org/>`__ and `BIOSCAN <https://ibol.org/bioscan/>`__ initiatives.
347
360
- See the code for the `cropping tool <https://github.com/bioscan-ml/BIOSCAN-5M/tree/main/BIOSCAN_crop_resize>`__ that was applied to the images to create the cropped image package.
@@ -352,7 +365,7 @@ Other resources
352
365
Citation
353
366
--------
354
367
355
-
If you make use of the BIOSCAN-1M or BIOSCAN-5M datasets in your research, please cite the following papers as appropriate.
368
+
If you make use of the BIOSCAN-1M, BIOSCAN-5M or Canadian Invertebrates 1.5M datasets in your research, please cite the following papers as appropriate.
356
369
357
370
`BIOSCAN-5M <BIOSCAN-5M paper_>`_:
358
371
@@ -394,6 +407,25 @@ If you make use of the BIOSCAN-1M or BIOSCAN-5M datasets in your research, pleas
0 commit comments