We have just launched a community forum on vector borne diseases! Join the discussion and connect with other researchers and practitioners in the field.
Learn about forum
The Vector-Borne Diseases Hub supports data sharing by leveraging existing resources while providing guidance on standards and privacy. Users can find ‘omics, traits, abundance, occurrence, and epidemiological data via the Hub’s search tools, with links back to source repositories. The Hub curator (Sarah Kelly) works with the community to identify priority datasets for discovery, align them to agreed data standards, and set appropriate access and privacy levels. This includes support for embargoes when data must remain private until a later release. The curator also supports uploads to specialised repositories, following their SOPs and submission guidelines. If no suitable repository exists for a data type, the Hub can host metadata and where appropriate, the data itself.
The SOPs for these specialised repositories can be found below.
For an overview of how the Hub manages, shares, and preserves data, see our Data Management Plan.
Our recommended specialised repository for occurrence type data is GBIF.
Datasets published through GBIF.org have sufficiently consistent detail to contribute information about the location of individual organisms in time and space—that is, they offer evidence of the occurrence of a species (or other taxon) at a particular place on a specified date. Occurrence datasets make up the core of data published through GBIF.org, and examples can range from specimens and fossils in natural history collections, observations by field researchers and citizen scientists, and data gathered from camera traps or remote-sensing satellites.
Occurrence records in these datasets sometimes provide only general locality information, sometimes simply identifying the country, but in many cases, more precise locations and geographic coordinates support fine-scale analysis and mapping of species distributions.
Datasets published through GBIF have to be formatted according to Darwin Core terms.
The Darwin Core Standard (DwC) offers a stable, straightforward and flexible framework for compiling biodiversity data from varied and variable sources. Most datasets shared through GBIF.org are published using the Darwin Core Archive format (DwC-A). Template available for checklist data above under “Template for checklist data”.
What is Darwin Core? https://www.gbif.org/darwin-core
Darwin Core manual: https://obis.org
The Darwin Core manual provides a list of Darwin core terms that should be used in datasets.
Columns in datasets published through GBIF must be renamed according to their most relevant Darwin Core terms.
Template for occurrence data according to the Darwin Core standards: here
Event Core describes when and where a specific sampling event happened and contains information such as location and date. Event Core is often used to organise data tables when there are more than one sampling occasion and/or location, and different occurrences linked to each sampling. This organisation follows the rationale of most ecological studies and typical marine sampling designs. It covers:
Event Core can be used in combination with the Occurrence and eMoF extensions. The identifier that links Event Core to the extension is the eventID. parentID can also be used to give information on hierarchical sampling. occurrenceID can also be used in datasets with Event Core in order to link information between the Occurrence extension and the eMoF extension. Occurrence Core datasets describe observations and specimen records and cover instances when:
Occurrence Core is often the preferred structure for museum collections, citations of occurrences from literature, and sampling activities.
Datasets formatted in Occurrence Core can use the eMoF Extension for when you have biotic measurements or facts about your specimen. The DNA derived data extension can also be used to link to DNA sequences. The identifier that links Occurrence Core to the extension(s) is the occurrenceID.
Occurrence Core standards are often used for occurrence data. A list of required Darwin Core information to publish occurrence data can be found here.
Note: while there are required terms needed for a dataset to be published on GBIF, additional information on the samples/species recorded (e.g. sampling protocol, habitat, additional remarks on geo-referencing/location) should also be included in the dataset according to the Darwin core terms.
In order to publish on GBIF, new publishers need to be endorsed by GBIF participants. This is done via regional GBIF nodes, the UK’s being the National Biodiversity Network (NBN).
While registration of an organisation can be done on GBIF, publishing has to be carried out through NBN. In order to share data with the NBN atlas, the organisation has to be set up as a data partner and agree to the NBN atlas terms of use. To become a new data partner with NBN, email [email protected] with the following information:
The point of contact provided by the organisation should be contacted by a representative from NBN, who will provide further guidance and feedback on the datasets. The full guidelines for registration as a data partner can be found on the NBN Atlas website.
Our recommended specialised repository for abundance type data is VecDyn (part of VectorByte).
So you have data that’s some sort of measurement of arthropod abundance/occurrence over time (and has some sort of location data) and want to prepare a dataset for VecDyn? Thats great! Here are some tips and how-tos for getting that spatiotemporal data into VecDyn.

We think the easiest way to get started is to find a dataset that generally looks like yours and go from there. For example, a dataset where there are 4 trapping locations, one species collected. Data is reported as one row for the number of animals collected per site per date.

We’re here to help - email the curator or VecDyn team. If it’s a one-off upload we will probably upload it for you. If you anticipate uploading multiple datasets, we may set your VectorByte account up with upload access.
This guide was adapted from VectorByte.
Our recommended specialised repository for trait type data is VecTraits (part of VectorByte) .
If you do not already have an account on VecTraits, you must create an account and request access to upload data. This can be found in the top right-hand corner of the page.

Once you have access to upload data to VecTraits you will see a drop-down menu under your login name on the top right corner of the page. There will now be an option to ‘Upload VecTraits Data’.

Click the ‘Upload VecTraits Data’ button, here you will find the latest instructions for loading (including column definitions) and a template you can download to ensure the column headers in your dataset are those that will be recognised by the VecDyn validator. The column headers in this template will match the column names in the VecTraits Column Definitions page.
The VecTraits column definitions display the columns or variables that should be present in your dataset. Those columns/variables that are mandatory are labelled as ‘true’ in the ‘Is Required’ column.
Once your template is populated with your data, please ensure you have followed all points in the instruction manual. Now you are ready to upload your data file. Drop your file into the upload box and press ‘Upload’.

Your data is now running through a validator. The validator should run relatively quickly, but validation time is dependent on the size of the dataset. The validator will draw your attention to any errors in your data such as missing fields or duplicated samples. You must fix the errors before the dataset successfully passes through the validator.
Once the dataset has passed validation it will be submitted to the VectorByte team for upload. Once you have done this, you have no direct access to the data any more. However, if you do make a mistake, just email the team and they should be able to identify and delete the offending dataset before uploading.
Please make a note of the date and time that you uploaded the dataset which you want discarded. This will make it a lot easier for the team to identify which dataset is yours!
VectorByte will contact you once your dataset has been added to the database.
Our recommended specialised repository for genomic type data is GENBANK.
Please follow the links below for the GENBANK submission types and tools.
GENBANK submission typesGENBANK submission toolsSome authors are concerned that the appearance of their data in GenBank prior to publication will compromise their work. GenBank will, upon request, withhold the release of new submissions for a specified period of time. However, if the accession number or sequence data appears in print or online prior to the specified date, your sequence will be released. In order to prevent the delay in the appearance of published sequence data, we urge authors to inform us of the appearance of the published data. As soon as it is available, please send the full publication data (all authors, title, journal, volume, pages and date) to the address: [email protected].
If you are submitting human sequences to GenBank, do not include any data that could reveal the source's personal identity. It is our assumption that you have received any necessary informed consent authorizations that your organisations require prior to submitting your sequences.
Our recommended specialised repository for proteomic type data is ProteomeXchange.
ProteomeXchange data submission documentationData Submission Guidelines for the ProteomeXchange ConsortiumOur recommended specialised repository for microarray type data is Gene Expression Omnibus (GEO).
GEO data submission guidelinesOur recommended specialised repository for transcriptomic type data is Sequence Read Archive (SRA).
Please read the SRA submission quickstart guide.
The preferred data submission type is FASTQ files.
Our recommended specialised repository for epidemiological type data is our own repository managed by the VBD Hub. This is a secure access controlled repository. Please reach out to the curator Sarah Kelly for more information.
The Hub aims to support the sharing of data in a way that is findable, accessible, interoperable and reproducible. The decision tree below is designed to help quickly determine where your data belongs based on its type.