UTUGuides: Research Data Management Guide for Students: During Research

Processing Research Data during Research

Storing and processing your research data in a controlled manner during your research is important. With adequate content description, returning to your research even after a long break is easier, and you will not have to spend a lot of time finding the materials and building an overall picture of the research.

Choose a secure storing place, so you can prevent the accidental destruction of your research data.
Choose a file format that will help using the data in different programs, even after a long period of time.
Naming the files logically and building logical folder structures will help you find the correct data.
With version control, you will not accidentally lose or destroy data.
Content description (metadata) helps you understand what type of data has been used in the research.
In case you process personal data in your research, you are bound by the FInnish Data Protection Act.

Secure Storing of Research Data

When you start working on your data, remember to keep the original data file separate from the version you are analysing.

The university can offer different online storage systems for students, such as Seafile and network folders.

If you have collected your research data with paper forms, you should scan the papers to create a digital version, so you can store all your research data in the same place.

Characteristics for a good data storage:

Automatic back-up – The risk of losing data increases without an automatic back-up.
Independence of physical location, i.e. the data can be accessed remotely.
Access control to limit access to the files with, for example, a password.

More information about data storages offered by the university can be found on the IT Services intranet page Where can I save my data, and emailing data@utu.fi.

NOTE: Never leave your work to one storage device, because no storage solution is a 100% reliable! A computer’s hard disk or USB flash drives are not long-lasting or safe data storages. Data stored on devices like this is prone to loss, corruption, or accidentally ending up in the wrong hands. This is why we recommend to store data in a service offered by the University of Turku where the university’s IT Services will take care of data security and back-ups. For data protection reasons, research data including personal data should never be stored anywhere other than the university’s online storage.

Content Description, or Metadata, of Research Data

The most common reason why the collected data cannot be used during or after research is that the important information related to the research has already been forgotten. To avoid this, you should keep a research diary and make notes of the changes you have made in your research data as you go along.

Accurate documentation ensures the data can be used in the future. At its simplest, you can write a description of your data on a basic Notepad file, i.e. a README file, that you can save as a separate file with the rest of your research data. The file can include information about:

The collector or creator of the data and affiliated organisation
Where the data is stored
How the data has been prepared for analysis
How the data has been edited
What methods have been used to analyse the data
What equipment and programs have been used at different stages of the research
What publications have been created based on the data
File formats and standards

Choosing a File Format

If you are working on your thesis on more than one device or program, you should make sure the file has been saved in a format that can be opened with different programs. For example, if you have written your thesis using Apple Pages and saved the document in one of the program-specific file formats, you may not be able to open the document on the university’s Windows computers. In this case, you can create raw text in e.g. .txt or .rtf format and only save the text in .docx or .pages format when text formatting is necessary.

Choosing a file format will affect the usability of the data in the long run. To keep your research data usable for a long time, you should save at least one copy of the file in an open source format that is widely used and supported by many programs, or that is entirely independent of programs. Using an open source format will increase the probability the file is readable and usable long into the future.

In the Data Management Guidelines of the Finnish Social Science Data Archive you can find a more detailed description of the most common file formats used for saving text, image, audio, video, or data matrix files.

File Names, File Structures, and Version Control

Aim to create unambiguous, logical, and descriptive names for your files. The more files you have connected to your thesis, the more important it is to name them properly.

Do not give the same identical name to two different files. Keep the abbreviations in the file names understandable. Elements in the file names can be separated by underscores ( _ ), while words can be separated by hyphens ( - ), or capital letters. Using any other special characters ( & , * % # ; * ( ) ! @$ ^ ~ ‘ { } [ ] ? < > ) is not recommended. Dates should be written in a standardised YYYY-MM-DD format. For example: Example-data_2020-09-11.

A good descriptive way to name files is to add the metadata in the file name, so that it includes the research subject’s background information, such as date (of data collection), gender, and age.

File naming practices and explanations for abbreviations should be written down somewhere to make it easier to remember their meanings later.

Create a folder structure that fits your research project. Consider what type of data you collect or use. Does your research project have subprojects that have to be stored in separate folder structures? It is good to consider how you will organise the different data file formats used in your research, including original data, edited data used in the analysis, data generated by the analysis, descriptive data / metadata, etc.

Also, consider how specific or general the hierarchy of your folder structure will be. On the one hand, an overly specific and thorough folder structure may result in having to open too many subfolders before finding the file you were looking for. On the other hand, an overly general structure can make finding a file in a folder filled with a countless number of files like looking for a needle in a haystack.

Examples for different folder structures: CESSDA ERIC Data Management Expert Guide.

Version control is a crucial part of data management. When processing data, different versions of the data are generated, but sometimes you may need to go back to earlier versions. Version control can be automatic (recommended) or manual.

NOTE! Keep the original data separate from the data you process. Also, make sure your anti-virus program is active and your software is up-to-date.

In automatic version control, the system takes care of creating and organising the different versions of the files.

► There are tools available for more advanced version control, such as GitLab (also see the university's instructions on using GitLab), or GitHub.

In manual version control, the user will create and administrate the different file versions themselves (note! The importance of naming files).

► Suitable for a small amount of data administrated by the producer of the data themselves.

Are You Processing Personal Data?

Storing data that includes personal data of living individuals requires you to take special caution. This kind of data should only be stored on a secure storing service like the ones provided by the university. This kind of data should not be stored in any of the commercial cloud services (iCloud, Google Drive, Dropbox, etc.), neither should it be stored on a USB flash drive or on a personal unencrypted computer hard drive.

Please note that the General Data Protection Regulation (GDPR) of the European Union and the Data Protection Act states that unjustified processing of data that includes personal data is not allowed. The processing of personal data must always have a legal basis. This type of basis could be, for example, the common good, such as scientific research. The processing of personal data should always follow the principle of data minimisation, which means limiting the data collection and processing to only what is relevant and necessary to fulfil the purpose of the research. You can follow this principle by pseudonymising personal data as soon as the data in question is no longer needed to conduct research or to confirm research findings.

When you process personal data, pay attention to naming your files and folders. This kind of data, as well as the metadata of the files can reveal the identity of your research subjects to outsiders. Do not use the names or social security numbers of individuals when naming files or folders. A simple folder structure also helps the access administration. You can separate the files that include personal data from the rest of the data, which will make it easier to keep the data separate.

More information: University's Intranet page about data protection.
More information on cloud services: Cloud Guide.

This page references content from Helsinki University Library's Basic data management guidelines: Introduction to data management (CC BY licence).