ERC: Open Research Data and Data Management Plans

The following text has been copied from the ERC’s Open Research Data and Data Management Plans PDF and is provided for purposes of accessibility. It can be found in its original version here.

European Research Council

Scientific Council

Established by the European Commission

Open Research Data and Data Management Plans

Information for ERC grantees

by the ERC Scientific Council

Version 3.1

3 July 2019

This document will be regularly updated in order to take into account new developments in this rapidly evolving field. Comments, corrections and suggestions should be sent to the Secretariat of the ERC Scientific Council Working Group on Open Access, Research Data Management and Open
Science more broadly via the address erc-open-access@ec.europa.eu.

The table below summarizes the changes that this document has undergone.

HISTORY OF CHANGES

Version

Publication date

Change

Page

1.0

23.02.2018

Initial version

–

2.0

24.04.2018

Part ‘Open research data and data deposition in the Physical sciences and Engineering domain’ added.

15 – 17

–

Minor editorial changes; faulty link corrected.

6, 10

–

Contact address added.

3.0

23.04.2019

Name of WG updated

–

Added text to the section on data deposition

–

Reference to FAIRsharing moved to the general part from the life sciences part and extended

–

Added example of the Austrian Science Fund in the section on policies of other funding organisations; updated links related to the German Research Foundation and the Arts and Humanities Research Council; added reference to the Science Europe guide.

–

Small changes to the text on Image Data

–

Added reference to the Ocean Biogeographic Information (OBIS)

–

Reformulation of the text related to Biostudies

–

New text in the section on ‘Metadata’ in the Life Sciences part

–

Added reference to openICPSR

–

Added references to ioChem-BD and ChemSpider

16 / 17

–

Change of header ‘Geophysics’ into ‘Earth system science’

–

Information on EPOS updated

–

Minor editorial changes and updates

whole document

3.1

03.07.2019

Added reference to OpenNeuro

Open Research Data Management and Data Management Plans

Information for ERC grantees

The ERC has supported the cause of open science from its start in 2007, and continues to do so today. Open access to publications from ERC funded projects is mandatory. The next step in the development of open science is making research data publicly available when possible. This will benefit science by increasing the use of data and by promoting transparency and accountability.

The ERC embraces the so-called ‘FAIR data principles’: research data should be findable, accessible, interoperable and re-usable. This means that data should be:

identified in a persistent manner using community conventions, and described using sufficiently rich metadata;
stored in such a way that they can be accessed by humans and machines;
structured in such a way that they can be combined with other data sets;
licensed or have terms-of-use that spell out how they can be used by others.

The article by Wilkinson et al. on “The FAIR Guiding Principles for scientific data
management and stewardship”¹ provides a detailed discussion of the FAIR principles. Not all data can be made fully open. Where data raise privacy or security concerns, controls and limits on data access will be required. In some cases, it will be appropriate for researchers to delay or limit access to data in order to secure intellectual property protection. Any such restrictions on access should be explicit and justified, and such data should still be managed in line with the FAIR principles. For researchers, the move to open data means that they have to think about what data their research will produce, how these data will be described, and how they can be made available in such a way as to benefit science and society in general. This means that they have to draw up a data management plan and find suitable data depositories.

ERC requirements

Data Management Plans

All ERC projects funded under the Work programmes 2017 and later participate by default in the Horizon 2020 Open Research Data (ORD) pilot, with the possibility for grantees to opt out at any time. For projects funded under the Work programmes 2015 and 2016 grantees can opt into the pilot if they so wish.

ERC grantees of projects that take part in the ORD pilot are required to submit a data management plan (DMP) within six months after the start of their grant.

As practices with regard to data management, storage, and sharing differ widely across disciplines, the ERC uses a general set of requirements that DMPs should meet. A DMP should provide information on:

Data set description: Grantees to provide a sufficiently detailed description, including the scientific focus and technical approach, to allow association of their data sets with specific research themes.
Standards and metadata: Grantees to describe the protocols and standards used to structure their data (i.e. fully reference the metadata) so that other scientists can make an assessment and reproduce the dataset. If available, grantees to provide a reference to the community data standards with which their data conform and that make them interoperable with other data sets of similar type.
Name and persistent identifier for the data sets: Grantees should plan to use depositories that will provide a unique and persistent identification (an identifier) of their data sets and a stable resolvable link to where their datasets can be directly accessed. Submission to a public depository normally provides this; many institutional depositories provide similar services.
Curation and preservation methodology: Grantees to provide information on the standards that will be used to ensure the integrity of their data sets and the period during which they will be maintained, as well as how they will be preserved and kept accessible in the longer term. If available, to
provide a reference to the public data depository in which their data will reside.
Data sharing methodology: Grantees to provide information on how their data sets can be accessed, including the terms-of-use or the licence under which they can be accessed and re-used, and information on any restrictions that may apply. It is also important to specify and justify the timing of data sharing. This could be, for example, as soon as possible after the data collection, or at the end of the project. For data that underlie publications it could be, for example, at the time of publication or pre-publication.

A DMP that provides adequate information on these five topics will meet the FAIR principles.

The ERC does not prescribe a specific format for the DMPs that its grantees need to submit, because practices and standards differ widely across disciplines. However, grantees are encouraged to use the ERC template that is available on the Horizon 2020 Participant Portal:

ERC Data Management Plan Template: http://ec.europa.eu/research/participants/data/ref/h2020/gm/reporting/h2020-erc-tpl-oa-data-mgt-plan_en.odt

A very convenient on-line tool to formulate a DMP according to the requirements of the ERC (as laid down in the template) and of several other research funding organisations is provided by the Digital Curation Centre:

DMPonline tool: https://dmponline.dcc.ac.uk/

Grantees should also keep in mind the following guidance document:

Guidelines on Implementation of Open Access to Scientific Publications and Research Data in projects supported by the European Research Council under Horizon 2020: http://ec.europa.eu/research/participants/data/ref/h2020/other/hi/oa-pilot/h2020-hi-erc-oa-guide_en.pdf

Writing a DMP should not be regarded as a purely administrative exercise. Rather, it should provide a positive stimulus to thinking about how the data generated within a project will be stored, managed and safeguarded, and should be part of the research process from the outset. As a project progresses, the data generated may well change in type and volume. It is therefore useful to envisage a DMP as a dynamic framework, which should be maintained and modified as the research advances. Planning for submission early in the research cycle will facilitate the publication process. Good data management will save time, safeguard information and increase the visibility and impact of the research outcomes.

The ERC recognises that data annotation and deposition are time-consuming activities. ERC grant money can be specifically earmarked for this purpose, for example to contribute to the salary of a research assistant or to the costs of a commercial provider.

Data deposition

The ERC is convinced of the importance of data and their value to the scientific community. Data deposition can be complementary to publication, but data can also be deposited without an associated publication. The ERC considers data as an important scientific output; therefore data deposition should always be accompanied by a reference to the ERC grant number.

Publications present the pertinent data underlying conclusions made in a research paper and publishers increasingly require that all relevant data should be made available to the community. The ERC expects data underlying publications by ERC grantees to be available. Researchers often generate additional data, not directly linked to publications, which shape the way their projects develop, and these also can constitute a valuable resource. Funders
and indeed the public in general are anxious that all valid data be made available in order to promote scientific progress; the European Commission has adopted a policy of open data for all research that they finance. Data dumping is of course to be avoided and it is important that data be of sufficient technical and scientific quality as well as being sufficiently annotated and structured to be useful to the community. Ultimately, it is for the individual investigator to decide which data merit conservation and/or sharing. Where the scientific content is concerned, it is necessary to bear in mind that what seems of little interest in the
context of a particular project may be relevant to other lines of investigation and therefore of potential interest to the research community. So-called negative results may also be of potential value.

When looking for a depository for research data, first check whether there is a
thematic/community database where the data could be archived. Irrespective of the depository you choose, you should always check whether it is sustainable in the longer term and:

stores the data in safe way;
makes sure that the data will remain findable (via the use of a persistent identifier), as well as accessible and re-usable;
describes the data in a standard way, using accepted metadata standards;
and specifies a license governing access and re-usability of the data.

There are a number of organisations that carry out a certification of data depositories. The following links may be useful:

Core Trust Seal (this list includes depositories certified by the Data Seal of Approval and/or the World Data System): https://www.coretrustseal.org/why-certification/certified-repositories/
Nestor seal (DIN-Norm 31644): http://www.dnb.de/Subsites/nestor/EN/Siegel/siegel.html
ISO 16363 certified depositories: http://www.iso16363.org/iso-certification/certified-clients/

General depositories for research data

The following depositories are of interest to researchers in all domains:

Zenodo (not-for-profit, hosted by CERN): https://zenodo.org/
Dryad (not-for-profit membership organisation): http://www.datadryad.org/
Figshare (free service provided by private company): https://figshare.com/
Open Science Framework (not-for-profit, developed and maintained by the Center for Open Science²): https://osf.io/
Harvard Dataverse (not-for-profit, hosted by the Institute for Quantitative Social Studies IQSS at Harvard University): https://dataverse.harvard.edu/

While some of these depositories, such as Zenodo, are supported by public money, some others, such as Dryad, may charge a fee. Some degree of data curation may be provided but this is often not the case. Figshare is a commercial company that provides data management services to individuals and will advise about data curation and data deposition through a cloud provider. The company also works with institutions to enable them to curate their academic research outputs and host their data on their own machines.

For an extensive overview of data depositories across all disciplines, see:

Registry of Research Data Repositories (re3data.org): https://www.re3data.org/

At the European level, EUDAT bundles a large number of general and discipline-specific depositories:

EUDAT Collaborative Data Infrastructure (CDI): https://eudat.eu/eudat-cdi

A growing number of universities and research institutes host a depository for use by their research staff. Most of these institutional depositories are originally set up for storing (open access) publications, but dedicated research data depositories also occur. In order for an institutional depository to be acceptable as a trusted archive, it is essential that the university/institute has a data policy guaranteeing the support for data storage and sharing into the future.

Individual researchers may also set up their own focussed database. There are many such initiatives, which may be open to the community and can play a useful role. However most often, in contrast to public data depositories, these are not deposition databases, and as long as they depend on a single individual and/or funding source, long-term sustainability is challenging. In addition to the major problem of perennity, curation of the data may not always be adequate, with problems of quality, correct annotation, renewal (whether the database is up to date) etc. This can complicate access and also compromises re-use.

Many journal websites contain lists of depositories. In addition, there are an increasing number of commercial publishers that offer authors opportunities to store the research data underlying their publications.

If in doubt about how to deposit data, in what format etc., it is recommended to consult the depository directly.

Metadata and data preparation

In order to make stored data findable, accessible, interoperable and reusable (FAIR), it is not enough to store ‘raw data’; they need to be properly documented and described using informative metadata.

Defining appropriate metadata depends on the discipline and/or the methodology that was used to produce the data. Discipline-specific depositories often have detailed requirements for describing data that are stored in that depository.

A generally accepted minimum standard for describing information on the web, including research data, is Dublin Core. Further information on this metadata standard is available at:

Dublin core: http://dublincore.org/

For more information on disciplinary metadata standards see also

Digital Curation Centre: http://www.dcc.ac.uk/resources/metadata-standards

and the Metadata Directory that has been set up under the auspices of the Research Data Alliance:

RDA Metadata Directory: http://rd-alliance.github.io/metadata-directory/

A curated resource on data and metadata standards, inter-related to databases and data policies can be found at

FAIRsharing: https://www.fairsharing.org/

From its first incarnation, BioSharing.org – which focused on the life sciences – FAIRsharing has evolved into a resource that serves users across all disciplines.³

Policies of other funding organsiations

As the movement towards open data progresses, various national funding agencies have formulated policies and specified requirements for DMPs that might be informative when drawing up a DMP, for example:

Austrian Science Fund (FWF): “Research Data Management”: https://www.fwf.ac.at/en/research-funding/open-access-policy/research-data-management/
Netherlands Organisation for Scientific Research (NWO): “Open (FAIR) Data”: http://www.nwo.nl/en/policies/open+science/data+management
German Research Foundation (DFG): “DFG Guidelines on the Handling of Research Data”: http://www.dfg.de/en/research_funding/proposal_review_decision/applicants/research_data/index.html
Swiss National Science Foundation (SNSF): “Open Research Data”: http://www.snf.ch/en/theSNSF/research-policies/open_research_data/Pages/default.aspx
The Research Council of Norway (RCN): “Open Access to Research Data”: https://www.forskningsradet.no/en/Article/Open_access_to_research_data/1240958527698
UK Arts and Humanities Resaerch Council (AHRC): “Data Management Plan”: https://ahrc.ukri.org/documents/guides/research-funding-guide1/ (Funding Guide: “4. Application Guidance of the Funding Guide”; see also “5. Assessment Criteria and Peer Review”)

The following document by DG Research of the European Commission is also instructive:

“Guidelines on FAIR Data Management in Horizon 2020”: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

In November 2018 Science Europe published its

“Practical Guide to the International Alignment of Research Data Management”: https://www.scienceeurope.org/wp-content/uploads/2018/12/SE_RDM_Practical_Guide_Final.pdf

Developed by experts from Science Europe member organisations and in consultation with the broader research stakeholder community, the guide presents core requirements for DMPs and criteria for the selection of trustworthy repositories, as well as some guidance to organisations on how to put these into practice.

In what follows more specific information is given for ERC grantees in the Life Sciences and in the Physical Sciences and Engineering, and for those working in the Social Sciences and Humanities. This will include references to specialised depositories for specific disciplines where such are available, and more general information in other cases. Note that this information is provided ‘as is’, i.e. it does not reflect any particular preference on part of the ERC as to which depositories, protocols, metadata or sharing methodologies an ERC grantee chooses to use.

Open research data and data deposition in the Life Sciences domain

The Life Sciences have a long tradition of open access data depositories. Submission of datasets to an established public depository is considered good scientific practice and is often also a condition for publication. The public depositories ensure that data are correctly curated, accessible and maintained in the long term. Data publication through such a depository will make your data FAIR. In addition, several publishers are implementing formal data citation in the reference list of papers, which will provide a mechanism to attribute credit to datasets. In this context see the paper “A Data Citation Roadmap for Scientific
Publishers” by Cousijn et al..⁴

Established public depositories

ELIXIR, the ESFRI research infrastructure for life science data, has compiled a list of recommended depositories:

ELIXIR Deposition Databases for Biomolecular Data: https://www.elixir-europe.org/platforms/data/elixir-deposition-databases

Many of these are based at the EMBL-EBI (European Bioinformatics Institute; for advice on data deposition see http://www.ebi.ac.uk/submission/) with established partner databases in other parts of the world. The

NCBI resource site: https://www.ncbi.nlm.nih.gov/guide/sitemap/

also provides a list of data depositories, although many do not take public submissions.

Image data

In the rapidly developing area of microscopy and bioimage data, solutions for public archiving of data sets are currently being built. There is already an electron microscopy public image archive:

Electron Microscopy Public Image Archive (EMPIAR): https://www.ebi.ac.uk/pdbe/emdb/empiar/

The new European research infrastructure Euro-BioImaging covers a wide range of imaging approaches:

European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences (Euro-BioImaging): http://www.eurobioimaging.eu/

Euro-BioImaging is actively promoting the development of a public bioimage archive in strategic collaboration with the EMBL-EBI and ELIXIR.⁵ A pilot service, the Image Data Repository (IDR), already accepts several types of light microscopy data, with a special emphasis on cell and tissue imaging (https://idr.openmicroscopy.org/about/). A general bioimage archive service is expected to become available in 2019.

Health sciences and clinical data

Many community databases exist in this area. Different ‘clinical speciality’ related databases are available, such as:

National Database for Autism Research (NDAR): https://ndar.nih.gov/

Clinical research outputs tend to be handled nationally because of varying national regulations about confidentiality, where data from individuals are concerned. Personal data poses additional ‘consent’ challenges and the development of public databases requires ‘controlled access’ for data protection. This is a rapidly evolving area where community standards and depositories will be established in the coming years. As standards emerge, the
ERC will adopt best practice as recommended by each research community. However, for information, all clinical trials should normally be registered, at the outset, in one of the publicly accessible registries identified by the World Health Organisation:

International Clinical Trials Registry Platform (ICTRP): http://www.who.int/ictrp/en/

Other types of depositories

In a number of research areas, the research community has generated specific archives. These may be depositories that aggregate data from multiple underlying depositories, so that they can be easily found and used by the community. This is the case for organismbased research with examples such as:

FlyBase – A Database of Drosophila Genes & Genomes: http://flybase.org/
WormBase: http://www.wormbase.org/
The Zebrafish Information Network (ZFIN): https://zfin.org/

National and international research consortia may also create databases. This is exemplified by a number of databases in the domain of biodiversity, such as:

Global Biodiversity Information Facility (GBIF): https://www.gbif.org/
Ocean Biogeographic Information (OBIS): https://obis.org/

Incorporating data into these resources can be very valuable for promoting research within the community, but additional deposition of the data into an established public data-typefocussed depository is highly recommended to ensure long-term curation, preservation and findability.

Data management in domains where established databases are not available

Many institutions have data storage facilities for unstructured data for which there is no existing dedicated community depository. This category includes data generated by functional studies where, for example, a cell component is removed and then complemented by another molecule or where behavioural studies are carried out to test brain function in an animal model. Unstructured data are accepted by depositories such as:

Dryad Digital Repository: https://datadryad.org/
Zenodo: https://zenodo.org/
Figshare: https://figshare.com/

In the case that the data behind a study is archived in multiple resources or locations, the ERC encourages grantees to deposit the study metadata, including links to the data location(s), in a recognised resource such as BioStudies (https://www.ebi.ac.uk/biostudies/). This also allows life science data for which there is no thematic depository to be deposited. This includes image data, until the bioimage archive is fully operational. Often a BioStudies record corresponds to the data behind a paper and so can be used to provide a simple link from the paper to the data behind the study via the accession number.

Metadata

In the life sciences, the key community deposition databases have strict metadata standards that are required for deposition of data to make them FAIR. Therefore, much of the thinking of what metadata should be supplied is provided and managed in this way.

Activities surrounding standardisation of metadata (such as cross-data resource identifier mapping, mapping of textual metadata labels to ontology and standard vocabulary terms, standardisation of computational workflows and application programming interfaces (APIs), and schematic mark-up of the data) can be facilitated by reusing existing mature interoperability resources. The section on ‘Interoperability’ on the ELIXIR website (https://elixir-europe.org/platforms/interoperability) recommends interoperability tools for
the purpose of making the data FAIR via the following resources:

The ELIXIR Recommended Interoperability Resources (RIRs): https://www.elixir-europe.org/platforms/interoperability/rirs
Bioschemas: http://bioschemas.org/
Common Workflow Language: https://www.commonwl.org/

Additionally, the ‘Tools’ section on the ELIXIR website (https://elixireurope.org/platforms/tools) provides links and guidance on good practice for open source software development in the life sciences.⁶

Open research data and data deposition in the Social Sciences and
Humanities domain

The situation with regard to open data in the SH domain, both in terms of infrastructure (depositories) as well as protocols and standards, is rapidly evolving. There are many initiatives, at the national and supra-national levels, that aim to provide researchers with the necessary tools and information.

Characteristic feature of the disciplines that together make up the ERC’s SH domain is their variety, in terms of topics, epistemologies, and methodologies. This is reflected also in the data that SH projects produce: quantitative data sets; experimental data; observational data; interviews; archival data; human artefacts; medical and genetic data; and so forth. In addition, the various kinds of data crosscut the disciplinary divisions, as several disciplines produce different kinds of data, depending on the methodologies used.

Also, particular restrictions may apply to making data open depending on the discipline. Data may include copyrighted material, such as literary texts or images, or archival materials to which access is restricted. In other cases, data may include privacy-sensitive material, such as video recordings of parent-child interactions or interviews.

For this reason, it is not possible to provide a single set of guidelines for the entire SH domain. Therefore, this document aims to provide some general and some discipline-specific references that ERC grantees can use to draw up DMPs that are adequate for their discipline and their specific project, and that meet the FAIR principles.

In what follows more information is given on:

general depositories
discipline-specific depositories
metadata and data preparation

General depositories

There are many options available for SH scholars, both general as well as discipline-specific, not-for-profit as well as commercial. The list below mentions a number of well-known depositories for use by social sciences and humanities disciplines, but it is certainly not exhaustive.

An important selection of depositories for SH scholars is provided by CESSDA:

Consortium of European Social Science Data Archives (CESSDA): http://cessda.net/

CESSDA is a so-called ‘ERIC’, a European Research Infrastructure Consortium
(http://ec.europa.eu/research/infrastructures/index_en.cfm?pg=eric), i.e., an international entity established by the European Commission that has national governments or consortia as its members. Currently, CESSDA has 18 members, all of them national agencies that operate on a not-for-profit basis. Many of the CESSDA depositories also cover (some of) the humanities in addition to the
social sciences.

The geographical coverage of CESSDA is growing. Among the EU countries, missing at the time of writing are some Southern European Countries (Italy, Spain) and most EU-13 countries. Another prominent absentee is Ireland.

Also of interest to researchers in the SH domain is ICPSR:

Inter-university Consortium for Political and Social Research (ICPSR): http://www.icpsr.umich.edu/

ICPSR is a not-for-profit membership organisation that maintains a data archive in the social and behavioural sciences:

openICPSR: https://www.openicpsr.org/openicpsr/

Currently ICPSR has a membership of more than 770 universities, government agencies, and other institutions.

Discipline-specific depositories

There are a number of depositories that are discipline-specific, and that are usually maintained by discipline-specific organisations or consortia.

Linguistics

Linguistics Linked Open Data (LLOD): http://linguistic-lod.org/

LLOD is maintained by the Open Knowledge Foundation’s Working Group on Open Data in Linguistics (https://linguistics.okfn.org).

European Research Infrastructure for Language Resources and Technology (CLARIN): https://www.clarin.eu/
Depositing Services offered by CLARIN Centres: https://www.clarin.eu/content/depositing-services

CLARIN is an ERIC, like CESSDA. Its geographical coverage is wide, with currently 20 national consortia as full members and four consortia as observers. Among the EU countries, Spain, Ireland, Luxembourg and several EU-13 countries are currently not (yet) represented among the CLARIN membership.

Historical sciences

Depositories for the historical sciences are mostly at the institutional or national level. A number of CESSDA archives also accept historical data sets.

Archaeology

There are only few depositories dedicated to archaeology. Most of these have a national focus, such as:

Archaeological Data Service (ADS) in the UK: http://archaeologydataservice.ac.uk/
e-Depot for Dutch Archaeology (EDNA): https://dans.knaw.nl/en/about/services/easy/edna

EDNA was established by the Data Archiving and Networked Services (DANS) and the Cultural Heritage Agency (RCE) to archive digital research data of Dutch archaeologists in a sustainable manner and make them available. The data are stored in EASY (https://easy.dans.knaw.nl/), the online archiving system of DANS.

Arts and humanities

Digital Research Infrastructure for the Arts and Humanities (DARIAH): http://www.dariah.eu/

DARIAH is another ERIC. It is a pan-European infrastructure for arts and humanities scholars working with computational methods. It has seventeen members and several cooperating partners in eight non-member countries. Among the EU countries, missing at the time of writing are Spain and a number of EU-13 countries.

Note that several CESSDA archives also accept humanities data sets.

Psychology

The Leibniz Institute for Psychology Information (ZPID; https://www.zpid.de) has developed a data-sharing platform specialized for psychology research:

PsychData: https://www.psychdata.de/

For an extensive overview of data depositories in psychology, see the article “Finding a Home for Your Science” by DeSoto.⁷

Of interest for researchers working in the psychology subdomain of cognitive neuroscience is the platform

Open Neuro: https://openneuro.org/

which allows the sharing of MRI, MEG, EEG, iEEG, and ECoG data.

Demography

Data Sharing for Demographic Research (DSDR): http://www.icpsr.umich.edu/icpsrweb/DSDR/

DSDR is housed within the Inter-university Consortium for Political and Social Research (ICPSR) mentioned earlier.

CESSDA archives will normally also accept demographic data sets.

Metadata and data preparation

A general overview of SH metadata standards can be found on the SH-specific pages of the DCC:

Digital Curation Centre (DCC): http://www.dcc.ac.uk/resources/subject-areas/social-science-humanities

The DCC website lists metadata standards for, among others, archaeology, social and policy studies, economics, heritage studies.

For metadata and data preparation in the social sciences, see the following guide on the website of the Inter-university Consortium for Political and Social Research (ICPSR):

Guide to Social Science Data Preparation and Archiving: http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/

For metadata and data preparation in linguistics, see:

Section on ‘Standards and Formats’, CLARIN website: https://www.clarin.eu/content/standards-and-formats

Open research data and data deposition in the Physical sciences and Engineering domain

The PE domain has a large number of data depositories. In the following section a number of areas are addressed in some detail. This list should by no means be considered as an exhaustive one, rather as a collection of representative examples in a rapidly evolving landscape.

Discipline-specific depositories

Astronomy

The Strasbourg astronomical Data Center is dedicated to the collection and worldwide distribution of astronomical data and related information:

Strasbourg astronomical Data Center (CDS): https://cdsweb.u-strasbg.fr/

It hosts a variety of repositories of multi-wavelength data and provides useful interfaces, e.g. the SIMBAD astronomical database (http://simbad.u-strasbg.fr/simbad/), the world reference database for the identification of astronomical objects; VizieR (http://vizier.ustrasbg.fr/viz-bin/VizieR), the catalogue service for the CDS reference collection of astronomical catalogues and tables published in academic journals; and the Aladin interactive software sky atlas for access, visualization and analysis of astronomical images, surveys, catalogues, databases and related data (http://aladin.u-strasbg.fr/aladin.gml).

Chemistry

The use of public depositories and databases in chemistry is still developing, with the majority of the progress happening in the area of structural chemistry. The

Worldwide Protein Data Bank: http://www.wwpdb.org/

manages the archives of the Protein Data Bank, which provides a repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies.

Another key resource in use in this area is the

Cambridge Crystallographic Data Centre: https://www.ccdc.cam.ac.uk/

for small molecule crystallography data.

UniProt: https://www.uniprot.org/

covers direct sequencing data for proteins, and both

ProteomeXchange: http://www.proteomexchange.org/

and the

PRIDE Archive – proteomics data repository: https://www.ebi.ac.uk/pride/archive/

deal with mass spectrometry proteomics data.

A network of repositories for open access Computational Chemistry research results is

ioChem-BD: https://www.iochem-bd.org/

A free chemical structure database providing fast text and structure search access to over 67 million structures from hundreds of data sources is

ChemSpider: http://www.chemspider.com/

Maintained by the Royal Society of Chemistry, it also encourages researchers to upload their own data.

Earth system science

Digital seismic waveform data in standardized format are available via the International Federation of Digital Seismograph Networks (FDSN, formed in 1985), which provides a huge amount of accessible data via the various on-line data centres, all accessible via the FDSN website:

Federation of Digital Seismograph Networks (FDSN): http://www.fdsn.org/webservices/datacenters/

The Data Management Center of IRIS – Incorporated Research Institutions for Seismology (http://www.iris.edu/) in the US is one of the hubs for seismology that serves the international FDSN community, also archiving historical data from pre-digital sources:

IRIS DMC: http://ds.iris.edu/ds/nodes/dmc/data/types/

Likewise,

UNAVCO: https://www.unavco.org/

archives and distributes geodetic data (GPS/GNSS, InSAR) for research purposes.

Geochemists also have on-line databases, for example a relational database of peerreviewed summary data on the geochemistry of all reservoirs in the earth
(https://earthref.org/GERM/). Data from geomagnetic observatories around the world can be obtained through the ‘Intermagnet’ program (http://www.intermagnet.org/). The

European Plate Observing System (EPOS): https://www.epos-ip.org/

is a collaborative framework where many diverse communities of geoscientists and engineers aim at providing open access to geophysical, geochemical and geological data as well as visualization and modelling tools. At present, EPOS includes ~300 research institutions from 25 European countries. In October 2018, the European Commission granted EPOS the legal status of an ERIC (European Research Infrastructure Consortium), which is currently joined by ten countries: Belgium, Denmark, France, Italy, the Netherlands, Norway, Poland, Portugal, Slovenia and the United Kingdom. Greece, Iceland and Switzerland will initially participate as observers.

Material sciences

The Crystallography Open Database contains the crystalline structures of a large number of systems. Researchers can contribute with their own results:

Crystallography Open Database (COD): http://www.crystallography.net/cod/

RefractiveIndex.INFO (https://refractiveindex.info) contains the dielectric functions of various materials.

Particle physics

Scattering data providing mostly documentation of published results (data points from plots and tables) are deposited at the

Durham High Energy Physics Database (HEPData): https://hepdata.net/.

Software engineering

In computer science (but also physics, astronomy etc.) one research output is the development of code.

Github: https://github.com

is an extremely popular platform to publish such output, and while behind Github is a commercial company, public projects can be stored for free.

Telecommunications

A library of test instances for Survivable fixed telecommunication Network Design is provided by

SNDlib: http://sndlib.zib.de/home.action.

It contains realistic network design test instances available to the research community and serves as a standardized benchmark for testing, evaluating, and comparing network design models and algorithms. Every user can contribute by submitting new test instances, new solutions or dual bounds for existing test instances.

Video Quality Experts Group (VQEG): https://www.its.bldrdoc.gov/vqeg/video-datasets-and-organizations.aspx

collects websites containing video content, including video test sequences. The

Consumer Digital Video Library (CDVL): http://www.cdvl.org/

provides a repository of video content that is suitable for determining the effectiveness of consumer video processing applications and quality measurement algorithms. Users can share and download high-quality uncompressed video clips, which can be filtered using a clip descriptor and recommended usage guidance.

QUALINET Databases: http://dbq.multimediatech.cz

accepts and shares numerous datasets used in the field of Quality of Experience research in multimedia systems.

Metadata

In the situation where there is no public or community database for a data type, the ERC encourages grantees to deposit the metadata, including links to the data location, in a recognised resource.

A good example where standards for metadata have been established is given by the Virtual Observatory (VO) with the vision that astronomical datasets and other resources should work as a seamless whole. Many projects and astronomical data centres worldwide are working towards this goal via the International Virtual Observatory Alliance (IVOA – http://ivoa.net). The IVOA debates and agrees the technical standards that are needed to make the VO possible. It also acts as a focus for VO aspirations, a framework for discussing
and sharing VO ideas and technology, and body for promoting and publicising the VO.

Footnotes:

¹Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3:160018 (https://doi.org/10.1038/sdata.2016.18)

²https://cos.io/

³Sansone, S.-A. et al. (2019). FAIRsharing as a community approach to standards, repositories and policies. Nature Biotechnology, volume 37, pages 358-367. (https://doi.org/10.1038/s41587-019-0080-8)

⁴Cousijn, H. et al. (2018). A Data Citation Roadmap for Scientific Publishers. Scientific Data 5, 180259 (https://doi.org/10.1038/sdata.2018.259)

⁵Ellenberg, J. (2018). A call for public archives for biological image data. Nature Methods 15, 849-854 (https://doi.org/10.1038/s41592-018-0195-8)

⁶Jiménez, R.C., Kuzak, M., Alhamdoosh, M. et al. (2017). Four simple recommendations to encourage best practices in research software [version 1; referees: 3 approved]. F1000Research 6:876 (https://doi.org/10.12688/f1000research.11407.1)

⁷DeSoto, K. A. (2016). Finding a Home for Your Science. Observer, Volume 29, Issue 5 (https://www.psychologicalscience.org/observer/finding-a-home-for-your-science)