Well-organized and documented data is easy to use, share, open, preserve, and reuse. Documentation involves describing the methods, structure, and handling of the data. Keep documentation up to date throughout the research process. Post-research documentation can be significantly more difficult, if not impossible.
Good data documentation enables:
Different disciplines have various documentation practices, which should be followed. At its simplest, a readme file describing the overall content is created with the data.
Good documentation includes:
Source: Fuchs, S., Koivula, H., Korhonen, T., Lindholm, T., Rauste, P., & Siipilehto, L. (2023, May 17). Data Organisation ABC workshop - Datan Organisoinnin ABC työpaja. Zenodo. https://doi.org/10.5281/zenodo.7944449
A ReadMe file binds the components of the data set together.
It accumulates:
The ReadMe file records documentation generated during data handling and information related to data quality. It also provides instructions for reusing the data.
Describing research materials is a part of the research process and helps others understand your research and data. Metadata, or data about data, is part of the research description and is typically the portion of the research material that can be openly found and used. Often, funding agencies require the disclosure of descriptive information about research materials. The easiest and most cost-effective way to produce descriptive metadata is step-by-step throughout the data's lifecycle
The University of Turku currently does not have a dedicated place to open research metadata. Metadata is stored in a suitable repository or data service. The Finnish Qvain is a recommended description tool; through it, described materials can be found in Etsin. Metadata via Qvain also transfers to the Finnish Metax metadata repository. Information materials can also be directly described via Metax's interface (Metax REST API).
Metadata can also be opened in many general or discipline-specific data archives or repositories. However, not all allow the separation of metadata from the data itself. Many data archives use metadata standards or schemas that should be followed from the beginning of data description.
Metadata and descriptive information should be collected during the research process. Describing data post-research is often more laborious.
High-quality metadata serves as a researcher's calling card for their study. Metadata includes information about the data's:
Different disciplines have established practices for describing data and marking metadata. However, it is important that the basics are described regardless of the field, promoting the findability, accessibility, interoperability, and reusability of research data in line with FAIR principles. Descriptions can be stored as text files or using an appropriate metadata format.
Each research dataset should have its own directory, containing both the data and its descriptive information. Some descriptive details are included within the main data file (e.g., variable explanations or unit information), but most are stored in separate descriptive files.
According to the Finnish Social Science Data Archive research data description should include these elements:
In describing research data, the aim is discoverability and usability, so it's advisable to implement it as consistently and machine-readable as possible, leveraging existing standards and schemas as widely as possible.
There are numerous metadata standards, some of which are highly specific to particular disciplines. Researchers should utilize the standard of their own discipline. Lists of different metadata standards can be found: DCC-listaus, Metadata Standards Catalog
The most commonly used standards are Dublin Core and DataCite. In many widely used metadata standards, such as Dublin Core and DataCite, there are both mandatory and optional fields.
Note! A data repository or archive may also require a specific metadata standard. If you already know at the beginning of your research the data repository you intend to use for preserving and sharing your research data, collect the metadata according to the metadata standard used by the data repository.
Dublin Core and DataCite
The standard for the Dublin Core metadata format is SFS-ISO 15836-1:2020 Information and documentation. Part 1 is the Core Elements, and 15836-2:2020 is Part 2, which defines properties and classes as identified by the Dublin Core community. Dublin Core has 15 mandatory fields. The content and other guidelines can be found, for example from Dublin Core or Paladini.
While DataCite's schema doesn't have the status of an official standard, its usage is highly controlled. DataCite consists of 20 elements. The entire DataCite schema and guidelines can be found here.
Examples of DataCite XML-formatted metadata can be found here.
Qvain, is easy to use for creating metadata for research datasets. Utilizing Qvain does not require the research data to be in the IDA service, but it's easy to link them together. After using Qvain, the described research dataset can be found through the Etsin tool, from where it is then harvested to various services and platforms.
Check out CSC's video on publishing your data in Fairdata with Qvain.
Qvain requires certain information from all described research datasets:
Qvain can also include information such as:
Metadata can also be produced informally, as long as it ensures that the information is in machine-readable format.
Important information includes:
Examples of informal metadata:
Harvard: https://datamanagement.hms.harvard.edu/collect-analyze/documentation-metadata/readme-files
Cornell: https://data.research.cornell.edu/data-management/sharing/readme/
The most well-known repositories utilize the following metadata standards or schemas:
Repository | Standard/Scheema | What else? | ||
---|---|---|---|---|
Zenodo | DataCite | Mandatory fields: Publication date, title, authors, description, access right, license | ||
Figshare | DataCite | Mandatory fields: Item title, item type, authors, categories, keywords, description, license | ||
IDA/Qvain | Fairdata Metax tietomalli | The mandatory fields in Qvain are: License, description of the dataset, title, publication date, keywords, author (individual or organization), and publisher (individual or organization). | ||
Dryad | Dublin Core, DataCite, OAI-ORE, RDF DataCube | Mandatory fields: Journal name; Title; Author(s); Abstract; Research domain; Keyword(s) | ||
Pangaea |
|
|
||
BOLD system | BOLD = Barcode of life data system. E.g. in a photo the mandatory fields are: Image file; Original specimen; View metadata; Sample ID; License; License year; License contact. | |||
Finnisha Social Science Data Archive | DDI, Data Documentation Initiative | Mandatory fields: Data creator or collector's name, response on informing participants, dataset name and brief description, dataset size, reporter's name, background organization, and email. | ||
EUDAT CDI B2SHARE / EUDAT B2SHARE | EUDAT Core ja Extended schema |
FSD´s Archiving Services uses DDI-format in XML.
DDI-formaatti tukee Tietoarkiston tavoitetta tallentaa ja arksitoida suomalaisen yhteiskunnan, ihmisten ja kulttuuristen ilmiöiden tutkimiseksi kerättyjä tutkimusaineistoja.
The DDI format supports the Data Archive's goal of preserving and archiving research datasets collected for studying Finnish society, people, and cultural phenomena.
According to the DDI format, the following aspects are described as clearly as possible in the metadata:
For additional information and detailed instructions, please visit:
International instructions: