Skip to Main Content
Turku University Library

Research Data Management (the lifecycle of research data)

Opening your research data

Choosing a Research Data Repository

Research data should be primarily published in the data archive or repository of your own field of science or research topic. By publishing your data in a discipline-specific data archive, it is more likely to be discovered by researchers in your field.

In general or multidisciplinary data archives, the discoverability of research data for researchers in your field may be more challenging.

The right data repository meets the following criteria:

 

You can search for different repositories from the following services:

Special features of different repositories are also presented on the Harvard Library website. 

Sometimes there's no need to open the entire research dataset; just the metadata is sufficient. However, the metadata should be comprehensive enough to convey an understanding of the research dataset.

In many archives, it's possible to open only the metadata, even if the entire research dataset has been deposited in the archive. However, this may not be possible in all cases.

It's advisable to start producing metadata right at the beginning of the research. Writing metadata later can be time-consuming. Check out tips for writing metadata: DATA DESCRIPTION AND METADATA

 

Turku University strongly supports FAIR principles, which suggest that research data should be as open as possible and as closed as necessary. Opening the entire dataset is not always feasible or sensible, but metadata for research datasets can be made open. Turku University does not have its own database for opening research data; instead, data opening is done through national or international platforms, such as repositories or data archives.

 

Should all data be openly accessible?

The answer, in simplicity, is NO.

When making assessments, go through the Digital Curation Centerin (DCC) check list to determine which data is worth preserving.

A persistent identifier (PID) refers to a unique identifier used in the online environment to identify entities such as publications, individuals, or research datasets.

Turku University recommends that each researcher obtain their own ORCID identifier. This identifier is useful in situations where a researcher changes their name or has multiple variations of their name, or when there are multiple researchers with the same name. More information about the ORCID identifier can be found in the UTUCRIS guide.

In scientific publications, the persistent identifier is usually issued by the publishing platform, often the publisher of the journal or monograph.

For published research data, a persistent identifier such as a URN (Uniform Resource Name) or DOI (Digital Object Identifier) can be used. URN identifiers are used by domestic data archives such as Etsin, FSD, and Kielipankki, while DOI identifiers are widely used in commercial publishing platforms and systems.

For the continued use of research datasets, research data should adhere to the so-called FAIR principles, meaning that the data should be Findable, Accessible, Interoperable, and Reusable.

  1. Findability:
  • The dataset has a unique and permanently persistent identifier. The most commonly used persistent identifiers for research data and scientific publishing are DOI, URN, and Handle. Generally, the storage location issues permanent identifiers for datasets.
  • The dataset has comprehensive metadata.
  • The dataset's metadata is indexed in search services. Many data repositories have interfaces to search services.
  • The persistent identifier should be included in the metadata.
  1. Accessibility:
  • The dataset or its metadata can be retrieved using the identifier. Not all datasets can be open, but metadata is almost always available.
  • The access policy is open and free.
  • Datasets should not be hidden behind a paywall.
  • Requesting permission to use closed datasets should be as easy as possible.
  • Metadata should be kept accessible even if the dataset itself is no longer available.
  1. Interoperability:
  • The dataset and its metadata are in a formal, reusable, available, and shared language.
  • The dataset should be both human-readable and machine-readable.
  • Data content should be shareable across systems.
  • Vocabularies, ontologies, and code lists are machine-readable.
  • The use of persistent identifiers enables referencing the dataset or metadata.
  1. Reusability:
  • Comprehensive descriptive information enables dataset reuse. L
  • icensing information is comprehensive and visible. Prefer the CC0 license.
  • The dataset can be easily linked to its origin and life cycle. Potential reusers need to know where the dataset came from and how to reference it.
  • The dataset meets the requirements of its own scientific field.

Authors of scientific articles may be asked to include a data availability statement or data access statement (DAS) in the article. Its purpose is to indicate where the research data associated with the article is available and under what conditions. The DAS may include a link to the dataset if applicable.

Publishers have their own guidelines for forming and placing DAS in the article, see for example:  Taylor & FrancisSpringer ja Elsevier.

Generally about Data accesability:  PLOS ONE -Data Availability, Nature -DAS

 

Data publications are peer-reviewed documents that include information such as data collection and analysis methods. They are published in peer-reviewed journals. Data publications increase the visibility of research and provide recognition to authors similar to scientific articles.

General Repositories

Zenodo is a general-purpose data repository suitable for various types of data. It is produced by CERN and funded by the EU.

Features:

  • Research data receives a permanent DOI identifier
  • Option to log in with ORCID or GitHub
  • Turku University has its own community
  • Integration with GitHub, enabling referencing of source code/software via DOI
  • Default storage space of 50GB per dataset
  • Default license is CC0, but other options are also available widely
  • Option to open only parts of the dataset, or set embargoes on opening. Conditional opening is also possible.
  • Metadata is always open
  • Not suitable for sensitive data
  • No curation service or assistance with description
  • Works only in the browser

You can explore the Turku University community in Zenodo here. UTU recommends using the community. When you add your dataset to the Zenodo UTU community, library experts will review and approve your dataset and its associated metadata.

Dryad is an interdisciplinary research data repository, with a particular focus on natural and medical science research data.

Features:

  • Dryad curates datasets ensuring high quality
  • Provides a permanent identifier (DOI) for the data
  • Data is licensed under CC0, with no other options available
  • Paid service
  • Not possible to open only parts of the datasets
  • Certified

More info and instructions

 

Figshare is an interdisciplinary repository where you can upload your data for free.

Features:

  • Maximum dataset size of 5TB (max 5000 files/dataset)
  • Option for paid Figshare+
  • Accepts CC licenses, but also other licenses
  • Provides DOI
  • ORCID integration
  • No metadata curation assistance

Harvard Dataverse is an open-source data repository developed at Harvard University Library. The source code of Harvard Dataverse serves as the basis for many other data repositories.

Special features:

  • Browser-based
  • Maximum dataset size of 1TB
  • Recommendation for using CC0 license
  • Option to link with ORCID, ISNI, LCNA, VIAF, GND, DAI, ResearcherID, Scopus ID
  • DOI assignment
  • Free of charge
  • Option for paid metadata curation assistance
  • Standardized metadata

Open Science Framework (OSF) is a general-purpose repository that provides researchers with assistance in data management throughout all stages of research.

Special features of the Open Science Framework:

  • Maximum size for open data: 50GB
  • Maximum size for closed data: 5GB
  • Multiple licensing options available
  • Free of charge
  • DOI assignment

Finnish Social Science Data Archive (FSD)  archives both qualitative and quantitative research data.

Special features of the Data Archive:

  • Focus on social science data
  • URN assignment
  • Free of charge
  • HAKA authentication