Skip to content

Releases: bioscan-ml/BarcodeBERT

v1.0.0

12 Jun 15:02
263e080
Compare
Choose a tag to compare

This repository contains all the code accompanying the paper BarcodeBERT: Transformers for Biodiversity Analysis (Millan Arias et al., 2025)

BarcodeBERT is a BERT-style transformer model trained exclusively on a dataset of DNA barcode sequences extracted from a reference library of Canadian invertebrates. In addition to the full pretraining pipeline, you’ll find scripts and notebooks for evaluating BarcodeBERT (and several off-the-shelf DNA foundation models) in various downstream tasks:

  • Fine-tuning for supervised species-level classification.
  • Similarity retrieval for labelling rare or unseen species via nearest neighbour search in the embedding space.
  • BIN reconstruction, where BarcodeBERT embeddings are used to group sequences into putative Barcode Index Numbers.

What's Changed

New Contributors

Full Changelog: https://github.com/bioscan-ml/BarcodeBERT/commits/v1.0.0