Research data often consists of various types of data, each with different file formats. There are numerous file formats, and new ones are continuously emerging while some become obsolete. For the long-term usability of research data, it is important to pay attention to file formats to ensure that the data remains accessible in the future.
It is recommended that at least one copy of each file is saved in an open file format that can be read by different software without the need for paid licenses. By taking care of file formats during the research process, later use of the data will not require conversion, which always carries the risk of data loss or distortion.
Good file formats include:
Text files:
Audio files:
Video files:
Image files:
Data security, which involves protecting information, systems, and data communication, is an important part of research data management. Electronic data can be easily copied and distributed, making it crucial to prevent unauthorized access. Backup is a part of data security.
Backup ensures your research data remains up-to-date and uncorrupted. UTU recommends automatic backup, reducing the risk of data loss. UTU's own storage services offer automatic backup.
For the best data security during the storage of research data, use UTU's Digital Services storage platforms. More information on secure storage, sharing, and usage of research data can be found on the Intranet.
If you use a storage platform other than UTU’s, at least find out the following:
Additionally, make sure to understand the data security of the storage platform, especially if your research data is sensitive or confidential. Ensure the platform's capacity meets your data needs and that its performance is sufficient.
Consider these points from the data management handbook:
Network Security: Assign personal read and write permissions to research personnel (e.g., usernames and passwords). This is especially important if the data is accessible via a network. Encrypt data transmitted over networks as necessary. Confidential information should not be stored on servers that provide internet services (e.g., web and email servers). Sensitive data should only be stored on computers not connected to networks. Also, ensure that the system does not store temporary or other files outside the access-restricted area during data processing.
Physical Security of Data Storage: Plan the storage and backup of research data to protect it from fire, burglary, water damage, or sabotage. Buildings should have access control, and doors should be locked when staff are not present. Access to rooms where research data is stored should be restricted. Prepare for potential computer and peripheral equipment failures. Store backups in a secure cabinet, and keep at least one copy of the data physically separate from other copies. Ensure the security of this copy as well.
Software Updates: Install critical operating system and software updates as soon as possible. Use a centralized automatic update service and remember that software updates can sometimes cause compatibility issues.
Virus Protection: All computers involved in the research project must have regularly and automatically updated antivirus software installed.
For more information on storage platforms and UTU's solutions, contact: data@utu.fi
Data Security in Research Involving Personal Data
Personal data can be pseudonymised or anonymised. As long as a person can be directly identified from the data or the data can be re-identified, it remains personal data and is subject to applicable regulations. More on the subject here.
During the course of research, the need for sharing affects the storage of research data, in addition to the quality of the data itself. If the research data contains sensitive or confidential information, particular care must be taken in its storage and especially in its sharing.
Turku University's digital services offer storage options for various types of research data. For personal use:
For the best sharing features and opportunities for collaboration:
Questions related to data management and sharing should be clarified for the entire research team.
Clear practices should be established regarding:
At the start of the research project, it's important to consider legal and ethical questions related to the research. Agree within the research team on who owns the data. If needed, seek assistance from legal support services for research (legal@utu.fi).
Also, consider any requirements from funders and publishers.
Secure sharing of data is possible through TY's Seafile and Taltio storage platforms.
When it comes to data containing personal information, it is always necessary to ensure that the chosen service is suitable for handling and storing such data.
Other possibilities for sharing data include:
CSC (IT Center for Science) Services: CSC offers a service for sending and receiving large amounts of data through Funet File Sender. Funet FileSender is a secure way to share large files with anyone. Log in to upload files to the service or request someone else to send you a file. CSC Service Catalog for Research Funet FileSender
Microsoft Teams + OneDrive: Teams is a collaboration and communication platform that helps people work together efficiently, whether they are working from home or in the office. Microsoft Office 365 - Teams Microsoft OneDrive is a cloud storage service that allows you to store files and photos online. You can access your files from anywhere and share them with others. Microsoft OneDrive
Google Drive: Google Drive is a cloud storage service developed by Google. It allows users to store files in the cloud (on Google's servers), sync files across different devices, and share files. Google Drive
These options provide various ways to securely share data depending on your specific needs and preferences.
Open Notebook Science refers to making a research project public from the very beginning. An open research notebook details, among other things, the stages of data collection and measurement results. The goal is to ensure the transparency of the research.
The research notebook can be shared, for example, on a regular website or on social media platforms. Documents can be shared using the University of Turku's SeaFile cloud storage service, for instance. See the Seafile instructions on the IT Services Intranetsite.
Identifiable data can be used for scientific research when it is appropriate, planned, substantiated, and there is a legal basis for processing the data (such as consent from the participant or research conducted in the public interest). In research involving personal data, the quality of data management is crucial. Ensure the pseudonymisation and anonymisation of data when necessary.
Pseudonymised data refers to data where direct identifiers are replaced with codes or pseudonyms. The code keys should always be kept separate from the analysis data. Depending on the quality of the data, pseudonymisation may suffice as a protective measure during the research. Access rights and handling of the code keys should be clearly agreed upon within the research team.
For pseudonymised data, it is recommended to delete direct identifiers such as names, social security numbers, images, etc., as soon as it is methodologically possible.
Anonymised data refers to data where individual identifiers have been removed, and they cannot be linked back to identifiable individuals.
FSD has great instruction on the national guidelines.
In the planning phase of the research, establish a consistent practice for naming folders and files. Also, agree on a uniform way to organize files into folders and subfolders. It's important that all participants in the project adhere to the agreed-upon practice.
Plan and agree in advance which file versions will be retained and/or published, and which will be deleted upon completion of the research.
Organisation and systematic naming help to:
Think in advance about who has access to folders. Use unique names for folders and files to prevent them from getting mixed up at any stage of the research.
Create separate folders for:
A good folder structure includes at least the following elements:
Version control keeps your data in order. It can be managed manually or automatically.
Jyväskylä University has great examples on their webpage.
When considering the storage duration of the data, take into account the specific requirements of the research funder and the data protection regulation.
The post-study utility of the research data also determines the options related to its storage.
Turku University generally recommends retaining research data for five years (15 years in medicine).
It should be noted that Turku University does not have its own data archive, so alternative solutions must be sought for long-term storage of research data.
When disposing of research data, it must be noted that simply deleting files may not be sufficient. Merely deleting a file and emptying the computer's recycle bin does not mean that the file has been permanently destroyed. Deleted data can be recovered even if the hard drive has been reformatted. Various programs are available for the final destruction of files, such as data overwriting or hard drive magnetization. They can also mechanically damage storage media, rendering them unreadable.
More information: