UTUGuides: Research Data Management (the lifecycle of research data): Opening and reusing research data

Opening your research data

Choosing a Research Data Repository

Research data should be primarily published in the data archive or repository of your own field of science or research topic. By publishing your data in a discipline-specific data archive, it is more likely to be discovered by researchers in your field. Note that even if you do not make the data openly available, it must still be preserved for the required period to ensure both research verification and the researcher’s legal protection. The retention period is generally five years, and fifteen years in medical research.

In general or multidisciplinary data archives, the discoverability of research data for researchers in your field may be more challenging.

The right data repository meets the following criteria:

Permanent identifier: DOI, URN, ORCHID
machine-readable metadata
a certificate indicating reliability
License
is used in your field
Note! EU funders often require a certified data repository.

You can search for different repositories from the following services:

CESSDA - Consortium of European Social Science Data Archives
Data repositories - Open Access Directoryn wiki
OpenAIRE -- Open Science Infrastructure
re3data.org - Directory

Special features of different repositories are also presented on the Harvard Library website.

Sometimes there's no need to open the entire research dataset; just the metadata is sufficient. However, the metadata should be comprehensive enough to convey an understanding of the research dataset.

In many archives, it's possible to open only the metadata, even if the entire research dataset has been deposited in the archive. However, this may not be possible in all cases.

It's advisable to start producing metadata right at the beginning of the research. Writing metadata later can be time-consuming. Check out tips for writing metadata: DATA DESCRIPTION AND METADATA

Turku University strongly supports FAIR principles, which suggest that research data should be as open as possible and as closed as necessary. Opening the entire dataset is not always feasible or sensible, but metadata for research datasets can be made open. Turku University does not have its own database for opening research data; instead, data opening is done through national or international platforms, such as repositories or data archives.

Should all data be openly accessible?

The answer, in simplicity, is NO.

When making assessments, go through the Digital Curation Centerin (DCC) check list to determine which data is worth preserving.

A persistent identifier (PID) refers to a unique identifier used in the online environment to identify entities such as publications, individuals, or research datasets.

Turku University recommends that each researcher obtain their own ORCID identifier. This identifier is useful in situations where a researcher changes their name or has multiple variations of their name, or when there are multiple researchers with the same name. More information about the ORCID identifier can be found in the UTUCRIS guide.

In scientific publications, the persistent identifier is usually issued by the publishing platform, often the publisher of the journal or monograph.

For published research data, a persistent identifier such as a URN (Uniform Resource Name) or DOI (Digital Object Identifier) can be used. URN identifiers are used by domestic data archives such as Etsin, FSD, and Kielipankki, while DOI identifiers are widely used in commercial publishing platforms and systems.

For the continued use of research datasets, research data should adhere to the so-called FAIR principles, meaning that the data should be Findable, Accessible, Interoperable, and Reusable.

Findability:

The dataset has a unique and permanently persistent identifier. The most commonly used persistent identifiers for research data and scientific publishing are DOI, URN, and Handle. Generally, the storage location issues permanent identifiers for datasets.
The dataset has comprehensive metadata.
The dataset's metadata is indexed in search services. Many data repositories have interfaces to search services.
The persistent identifier should be included in the metadata.

Accessibility:

The dataset or its metadata can be retrieved using the identifier. Not all datasets can be open, but metadata is almost always available.
The access policy is open and free.
Datasets should not be hidden behind a paywall.
Requesting permission to use closed datasets should be as easy as possible.
Metadata should be kept accessible even if the dataset itself is no longer available.

Interoperability:

The dataset and its metadata are in a formal, reusable, available, and shared language.
The dataset should be both human-readable and machine-readable.
Data content should be shareable across systems.
Vocabularies, ontologies, and code lists are machine-readable.
The use of persistent identifiers enables referencing the dataset or metadata.

Reusability:

Comprehensive descriptive information enables dataset reuse. L
icensing information is comprehensive and visible. Prefer the CC0 license.
The dataset can be easily linked to its origin and life cycle. Potential reusers need to know where the dataset came from and how to reference it.
The dataset meets the requirements of its own scientific field.

Authors of scientific articles may be asked to include a data availability statement or data access statement (DAS) in the article. Its purpose is to indicate where the research data associated with the article is available and under what conditions. The DAS may include a link to the dataset if applicable.

Publishers have their own guidelines for forming and placing DAS in the article, see for example: Taylor & Francis, Springer ja Elsevier.

Generally about Data accesability: PLOS ONE -Data Availability, Nature -DAS

Data publications are peer-reviewed documents that include information such as data collection and analysis methods. They are published in peer-reviewed journals. Data publications increase the visibility of research and provide recognition to authors similar to scientific articles.

General Repositories

Zenodo is a general-purpose data repository suitable for various types of data. It is produced by CERN and funded by the EU.

Features:

Research data receives a permanent DOI identifier
Option to log in with ORCID or GitHub
Turku University has its own community
Integration with GitHub, enabling referencing of source code/software via DOI
Default storage space of 50GB per dataset
Default license is CC0, but other options are also available widely
Option to open only parts of the dataset, or set embargoes on opening. Conditional opening is also possible.
Metadata is always open
Not suitable for sensitive data
No curation service or assistance with description
Works only in the browser

You can explore the Turku University community in Zenodo here. UTU recommends using the community. When you add your dataset to the Zenodo UTU community, library experts will review and approve your dataset and its associated metadata.

Dryad is an interdisciplinary research data repository, with a particular focus on natural and medical science research data.

Features:

Dryad curates datasets ensuring high quality
Provides a permanent identifier (DOI) for the data
Data is licensed under CC0, with no other options available
Paid service
Not possible to open only parts of the datasets
Certified

More info and instructions

Figshare is an interdisciplinary repository where you can upload your data for free.

Features:

Maximum dataset size of 5TB (max 5000 files/dataset)
Option for paid Figshare+
Accepts CC licenses, but also other licenses
Provides DOI
ORCID integration
No metadata curation assistance

Harvard Dataverse is an open-source data repository developed at Harvard University Library. The source code of Harvard Dataverse serves as the basis for many other data repositories.

Special features:

Browser-based
Maximum dataset size of 1TB
Recommendation for using CC0 license
Option to link with ORCID, ISNI, LCNA, VIAF, GND, DAI, ResearcherID, Scopus ID
DOI assignment
Free of charge
Option for paid metadata curation assistance
Standardized metadata

Open Science Framework (OSF) is a general-purpose repository that provides researchers with assistance in data management throughout all stages of research.

Special features of the Open Science Framework:

Maximum size for open data: 50GB
Maximum size for closed data: 5GB
Multiple licensing options available
Free of charge
DOI assignment

Finnish Social Science Data Archive (FSD) archives both qualitative and quantitative research data.

Special features of the Data Archive:

Focus on social science data
URN assignment
Free of charge
HAKA authentication