Skip to Main Content
Turku University Library

Research Data Management (the lifecycle of research data)

Storage of research data during the research project

Research data often consists of various types of data, each with different file formats. There are numerous file formats, and new ones are continuously emerging while some become obsolete. For the long-term usability of research data, it is important to pay attention to file formats to ensure that the data remains accessible in the future.

It is recommended that at least one copy of each file is saved in an open file format that can be read by different software without the need for paid licenses. By taking care of file formats during the research process, later use of the data will not require conversion, which always carries the risk of data loss or distortion.

Good file formats include:

Text files:

  • .txt
  • .odt
  • .xml
  • .html

Audio files:

  • .flac
  • .wav

Video files:

  • .mp2
  • .mp4

Image files:

  • .tif
  • .png
  • .svg
  • .jpg

Data security, which involves protecting information, systems, and data communication, is an important part of research data management. Electronic data can be easily copied and distributed, making it crucial to prevent unauthorized access. Backup is a part of data security.

Backup ensures your research data remains up-to-date and uncorrupted. UTU recommends automatic backup, reducing the risk of data loss. UTU's own storage services offer automatic backup.

For the best data security during the storage of research data, use UTU's Digital Services storage platforms. More information on secure storage, sharing, and usage of research data can be found on the Intranet.

If you use a storage platform other than UTU’s, at least find out the following:

  • Is there automatic backup, or do I need to remember to do it myself?
  • Can I access the research data remotely?
  • How is access control implemented? Can I restrict access to different files with passwords?

Additionally, make sure to understand the data security of the storage platform, especially if your research data is sensitive or confidential. Ensure the platform's capacity meets your data needs and that its performance is sufficient.

Consider these points from the data management handbook:

Network Security: Assign personal read and write permissions to research personnel (e.g., usernames and passwords). This is especially important if the data is accessible via a network. Encrypt data transmitted over networks as necessary. Confidential information should not be stored on servers that provide internet services (e.g., web and email servers). Sensitive data should only be stored on computers not connected to networks. Also, ensure that the system does not store temporary or other files outside the access-restricted area during data processing.

Physical Security of Data Storage: Plan the storage and backup of research data to protect it from fire, burglary, water damage, or sabotage. Buildings should have access control, and doors should be locked when staff are not present. Access to rooms where research data is stored should be restricted. Prepare for potential computer and peripheral equipment failures. Store backups in a secure cabinet, and keep at least one copy of the data physically separate from other copies. Ensure the security of this copy as well.

Software Updates: Install critical operating system and software updates as soon as possible. Use a centralized automatic update service and remember that software updates can sometimes cause compatibility issues.

Virus Protection: All computers involved in the research project must have regularly and automatically updated antivirus software installed.

For more information on storage platforms and UTU's solutions, contact: data@utu.fi

 

Data Security in Research Involving Personal Data

Personal data can be pseudonymised or anonymised. As long as a person can be directly identified from the data or the data can be re-identified, it remains personal data and is subject to applicable regulations. More on the subject here.

 

During the course of research, the need for sharing affects the storage of research data, in addition to the quality of the data itself. If the research data contains sensitive or confidential information, particular care must be taken in its storage and especially in its sharing.

Turku University's digital services offer storage options for various types of research data. For personal use:

  • Personal network folder: Provides 25 GB (or as needed) of free storage space, automatically created for UTU accounts.
  • Seafile cloud storage: Offers 100 GB (or as needed) of free storage space, automatically created for UTU accounts, with additional space available from Digital Services.
  • GitLab: A code repository service maintained by the university, based on git technology. While commonly used for version control of software code, it can also manage various types of text files. GitLab is a "cloud service" provided and maintained by Turku University's IT services, meaning all data stored in the service remains on the university's servers.

For the best sharing features and opportunities for collaboration:

  • Unit network drive / Taltio: Offers space as needed. Taltio can be ordered via helpdesk@utu.fi with the following information:
    • Name of Taltio
    • Desired amount of disk space
    • Responsible person
    • Cost center number
    • User accounts for Taltio
  • Teamworkspaces
  • Seafile cloud storage (100 GB)
  • GitLab

Questions related to data management and sharing should be clarified for the entire research team.

Clear practices should be established regarding:

  • Who is allowed to use the data and how
  • Who has the authority to decide on the use of the data
  • Who decides what will be done with the data and where it will be stored after the research

At the start of the research project, it's important to consider legal and ethical questions related to the research. Agree within the research team on who owns the data. If needed, seek assistance from legal support services for research (legal@utu.fi).

Also, consider any requirements from funders and publishers.

 

Secure sharing of data is possible through TY's Seafile and Taltio storage platforms.

When it comes to data containing personal information, it is always necessary to ensure that the chosen service is suitable for handling and storing such data.

Other possibilities for sharing data include:

  1. CSC (IT Center for Science) Services: CSC offers a service for sending and receiving large amounts of data through Funet File Sender. Funet FileSender is a secure way to share large files with anyone. Log in to upload files to the service or request someone else to send you a file. CSC Service Catalog for Research Funet FileSender

  2. Microsoft Teams + OneDrive: Teams is a collaboration and communication platform that helps people work together efficiently, whether they are working from home or in the office. Microsoft Office 365 - Teams Microsoft OneDrive is a cloud storage service that allows you to store files and photos online. You can access your files from anywhere and share them with others. Microsoft OneDrive

  3. Google Drive: Google Drive is a cloud storage service developed by Google. It allows users to store files in the cloud (on Google's servers), sync files across different devices, and share files. Google Drive

These options provide various ways to securely share data depending on your specific needs and preferences.

 

 

Open Notebook Science

Open Notebook Science refers to making a research project public from the very beginning. An open research notebook details, among other things, the stages of data collection and measurement results. The goal is to ensure the transparency of the research.

The research notebook can be shared, for example, on a regular website or on social media platforms. Documents can be shared using the University of Turku's SeaFile cloud storage service, for instance. See the Seafile instructions on the IT Services Intranetsite.

Identifiable data can be used for scientific research when it is appropriate, planned, substantiated, and there is a legal basis for processing the data (such as consent from the participant or research conducted in the public interest). In research involving personal data, the quality of data management is crucial. Ensure the pseudonymisation and anonymisation of data when necessary.

Pseudonymised data refers to data where direct identifiers are replaced with codes or pseudonyms. The code keys should always be kept separate from the analysis data. Depending on the quality of the data, pseudonymisation may suffice as a protective measure during the research. Access rights and handling of the code keys should be clearly agreed upon within the research team.

For pseudonymised data, it is recommended to delete direct identifiers such as names, social security numbers, images, etc., as soon as it is methodologically possible.

Anonymised data refers to data where individual identifiers have been removed, and they cannot be linked back to identifiable individuals.

 

FSD has great instruction on the national guidelines. 

In the planning phase of the research, establish a consistent practice for naming folders and files. Also, agree on a uniform way to organize files into folders and subfolders. It's important that all participants in the project adhere to the agreed-upon practice.

Plan and agree in advance which file versions will be retained and/or published, and which will be deleted upon completion of the research.

Organisation and systematic naming help to:

  • Avoid confusion during the research process and data analysis phase.
  • Facilitate data sharing within the research team.
  • Ensure data preservation, regardless of changes in the research team composition.
  • Ensure data readability and comprehensibility even after the research process has ended.
  • Clarify what the data contains and the principles by which it was compiled.

Think in advance about who has access to folders. Use unique names for folders and files to prevent them from getting mixed up at any stage of the research.

Create separate folders for:

  • Data files
  • Project management
  • Methods
  • Text files
  • Etc.

A good folder structure includes at least the following elements:

  • Each project has a unique main folder.
  • Codes
  • Data
  • At least one readme document covering administrative matters (there may be multiple readme files).

Version control keeps your data in order. It can be managed manually or automatically.

Jyväskylä University has great examples on their webpage.

Storage of research data after the research

 

When considering the storage duration of the data, take into account the specific requirements of the research funder and the data protection regulation.

The post-study utility of the research data also determines the options related to its storage.

  • If the research data can be reused, it should be opened and shared in a high-quality data archive.
  • If the data contains personal information, special care must be taken in storage and compliance with data protection guidelines.
  • If you consider your data to be nationally significant, utilize CSC's Fairdata PAS service.

Turku University generally recommends retaining research data for five years (15 years in medicine).

It should be noted that Turku University does not have its own data archive, so alternative solutions must be sought for long-term storage of research data.

Disposal of research data

When disposing of research data, it must be noted that simply deleting files may not be sufficient. Merely deleting a file and emptying the computer's recycle bin does not mean that the file has been permanently destroyed. Deleted data can be recovered even if the hard drive has been reformatted. Various programs are available for the final destruction of files, such as data overwriting or hard drive magnetization. They can also mechanically damage storage media, rendering them unreadable.

More information: