Skip to content

Add documentation about Darwin Core archives (including new "identifiers" extension) #716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions content/user/download/darwincore_archives/_index.en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: "Darwin Core Archives"
date: 2025-01-30
lastmod: 2025-01-30
draft: false
weight: 30
authors: ["Katie Pearson"]
keywords: ["Download", "Darwin Core", "DwC-A", "Archive", "Extensions"]
---

{{< notice info >}} This page describes how Darwin Core Archives represent data from Symbiota portals. {{</ notice >}}

A [Darwin Core Archive](https://ipt.gbif.org/manual/en/ipt/latest/dwca-guide) is a package of biodiversity data that aligns with the Darwin Core international data standard. The Symbiota data schema was designed to largely align with this schema for data imports and exports.

A Darwin Core Archive (DwC-A) downloaded from a Symbiota portal consists of a core [**occurrences**](https://rs.gbif.org/core/dwc_occurrence_2024-02-23.xml) file, containing the core biodiversity data relating to records of a taxon at a place and time (an "occurrence"), a metafile (meta.xml) that describes the structure of the Darwin Core Archive, a metadata file (eml.xml) describing the dataset, and multiple "extension" files with additional data regarding the occurrences. The "id" column in the occurrences file is the central link between the _occurrences_ file and its extensions.


![Darwin Core Star Schema](/symbiota-docs/images/dwca.jpg)

The following extensions are currently supported by Symbiota downloads:

* [**identifications**](https://rs.gbif.org/extension/identification_history_2024-02-19.xml): contains data regarding one or many taxonomic identifications of an occurrence. This file contains the history of taxonomic identification or annotation of the specimen. For example, if the occurrences file contains one record that was identified three times, the _identifications_ file will contain three rows, one for each taxonomic identification. In Symbiota portals, every identification must have a scientific name, a "dateIdentified", and an "identifiedBy" value.

* [**multimedia**](https://rs.gbif.org/extension/ac/audiovisual_2024_11_07.xml): contains data about any media resources (e.g., images, sound recordings) linked to the occurrence. For example, if the occurrences file contains one record with one image and one audio file, the _multimedia_ extension will contain two rows, one for each media resources.

{{< notice tip >}} **A DwC-A download does not contain the actual image files** linked to the occurrence data. The multimedia file consists of _links to_ where those images can be viewed/downloaded. For guidance on downloading images, [see this article](/symbiota-docs/user/download/download_images/). {{</ notice >}}

* [**measurementOrFact**](https://rs.gbif.org/extension/obis/extended_measurement_or_fact_2023-08-28.xml): contains data about any measurements or traits regarding the occurrence. This file is where data from the [Traits Module](/symbiota-docs/editor/trait/) are stored. This file consists of one line for every measurement value applied to an occurrence. In cases of hierarchical traits, multiple measurements may correspond to a single "trait". For example, if the trait "Flowers Present" is nested under "Reproductive", a single occurrence may have multiple rows in the measurementOrFact extension, one corresponding to "Flowers Present" and one corresponding to "Reproductive." Symbiota portals export the "Extended Measurement or Fact Extension" do support unique identifiers applied to trait values, such as links to relevant ontologies or scoring protocols.

* [**materialSample**](https://rs.gbif.org/extension/ggbn/materialsample.xml): contains data about [material samples](/symbiota-docs/editor/edit/materialsample/) related to occurrences. As with all extensions, there is often a one-to-many relationship between occurrences and material samples, with one occurrence related to many material samples.

* [**identifiers**](https://rs.gbif.org/extension/gbif/1.0/identifier.xml): contains any alternative identifier values belonging to the occurrences, as stored in the "Tag Name / Alternative Identifier Values" table ([see this article](https://biokic.github.io/symbiota-docs/editor/edit/fields/catno/)). Other catalog numbers, accession numbers, or anything else not in the Catalog Number field is exported into this extension.
Binary file added static/images/dwca.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.