(c) V.A. Wilton, 2021 Photo by V.A. Wilton.

Standards and tools

The most efficient way to prepare and publish data to the GBIF network is using Darwin Core-based datasets that are published through GBIF’s Integrated Publishing Toolkit.

The resources below provide key links to get started and to find additional information on mobilisation biodiversity data using Darwin Core and the Integrated Publishing Toolkit.

Data standards and formats

Name Description Resources
Darwin Core (DwC) DwC provides the primary standard for mobilising biodiversity data in the GBIF network. It is used to describe the occurrence of organsims in nature as recorded by observations, specimens and samples. DwC is a standard maintained by Biodiversity Standards International (TDWG). Darwin Core Standard

DarwinCore Quick Reference Guide
Darwin Core Archive (DwC-A) DwC-A is the main format used to publish biodiversity data in the GBIF network. It is star-schema archive (ZIP) that contains a set of files that includes a metadata file, a descriptor file that defines the structure and relationship of the data files, and one or more data files in TSV and/or CSV format. GBIF page on Darwin Core and archives.
Ecological Metadata Language (EML) EML is used in a Darwin Core Archive to record the metadata for the published resource. EML provides a vocabulary for documenting research data. EML Standard
Comma Separated Values (CSV) CSV files are used for some data files within a Darwin Core Archive. CSV is a commonly used format that uses a comma to separate values within a record, and line breaks (CRLF) to separate records. Care must be taken when using CSV with biodiversity data as many fields may contain commas, line breaks and other characters that can result in poorly formed CSV. IANA rfc4180

W3C Model for Tabular Data
Tab Separated Values (TSV) TSV files are used for some data files within a Darwin Core Archive. TSV is a commonly used format that uses the tab character to separate values within a record. TSV is often preferred over CSV because the field separator of TSV (tab) is less likely to contained in the data than comma used by CSV. W3C Model for Tabular Data

IANA Media Type

Tools and services that assist data mobilisaton

Name Purpose Scope
Integrated Publishing Toolkit (IPT) Mobilisation IPT is a free, open source software tool that is used to publish biodiversity datasets to the GBIF network.
GBIF Data Validator Format validation Validates a DarwinCore-Archive file.
Darwin Core Archives Examples Templates for familiarisation Example spreadsheet templates for occurrence, checklist and sampling-event datasets.
New Zealand Organisms Register (NZOR) Data validation A list of the names of organisms relevant to New Zealand. In addition to the website, NZOR provides a Matching service that can be used to validate names.

Data quality

Resource Source
Chapman 2005 Principles of data quality. GBIF
Chapman - generalising sensitive species GBIF
Data Quality Requirements GBIF
Data Quality Requirements: Occurrence datasets GBIF
Data Quality Requirements: Checklists GBIF
Data Quality Requirements: Sampling event datasets GBIF
Biodiversity Data Quality TDWG