Research data, whether in the form of scientific notes or measurement results, is a valuable raw material. And if it contains individuals’ personal data – when working in the social sciences, for example – then it is also subject to legal data protection requirements. Empirical data takes significant amounts of time and money to produce, so it also requires additional thoughtful measures to ensure data security and long-term retention. Such data sets are unique. If lost, they take truly extensive work to replace – and the new data will never be identical to the original.
There is a specific set of compliance rules for the fields of research and education which universities, institutions and funding bodies like the DFG (Deutsche Forschungsgemeinschaft) are increasingly emphasizing. But in any case, even aside from these rules, data volumes are rising – and so are the options for data analysis. Meanwhile, the requirements for long-term, reliable data protection and archiving are becoming stricter. A sophisticated, resilient plan for research data management should now be part of any new project. What role will long-term data archiving play in it? What are the implications of specific compliance requirements? And above all, how can we realize reliable data management in research and education, and how can even large volumes of research data be securely archived for the long term?
Compliance and long-term preservation
The term “compliance” describes the need to meet various rules and regulations, sometimes based on laws, but also on other sources like industry- or company-specific requirements. For researchers, this means adhering to legal requirements in areas like data protection and copyright law. When it comes to empirical research, ethical considerations may also play a role. Researchers are also bound to the rules of good scientific practice. Among many other aspects, these describe how to carefully archive research data and ensure its accessibility. This is designed to guarantee that the data can be reused in future.
For example, the DFG guidelines for working with research data specify that even the initial draft of a research project should include data security and archiving considerations. Researchers should ensure that research data arising from a project should also be available to other researchers for further use. The goal is to ensure the long-term security of the data, which can archived for up to 10 years.
Secure, careful long-term preservation of data is therefore a key aspect of research data management. A sophisticated archiving strategy will ensure that valuable data remains available and usable for a long time to come.
In order to account for the required storage hardware and software resources during the planning process, it is essential to consider the amount and types of data that will be generated. Another key question is to what extent the rights of persons involved in the research take effect. In short, you have to know what you are actually storing before you can plan an archiving strategy that will store data, reliably protect it, and keep it available for the long term. This is the only way to determine what storage hardware and software you need.
Reliable, powerful storage software for compliance in research
Dedicated storage software needs to quickly and reliably save research data to the selected storage system. It also needs to guarantee fast, seamless access to archived data – even years after that data is first saved. In order to keep the focus on capabilities and costs, the software should support a wide range of storage hardware, independent of any single vendor or technology.
PoINT Storage Manager and PoINT Archival Gateway are two storage software solutions that meet the requirements for long-term preservation of research data. They are capable of handling vast amounts of data and can help to meet specific compliance requirements. Whether PoINT Storage Manager or PoINT Archival Gateway will suit you better depends on the type and volume of data being archived, as well as your archiving requirements.
The PoINT Storage Manager transfers data to a suitable storage tier depending on its age and other prespecified rules, saving it to the ideal medium within your storage infrastructure. This type of “information lifecycle management” takes the file’s life cycle into account and therefore engages with the principle of the research data life cycle. As well as supporting efficient use of your storage infrastructure, it also securely archives critical data.
The PoINT Archival Gateway offers a high-performance, scalable object storage solution with an S3 interface. This makes it possible to save data volumes in the petabyte range, with high transfer rates to tape media. It helps users to securely and compliantly archive valuable research data for long periods of time.
Both solutions also offer other options for quickly and seamlessly accessing archived data, allowing users to easily access files even years after they were first created. This makes it possible to reuse data in future.
Long-term archiving and reusability
In order to meet long-term archiving and reusability requirements, it is essential to consider and plan for outages and data migration. Data should be stored in more than one location, and individual storage units typically need to be replaced every five to seven years. First, this means that you need to incorporate a complete backup strategy into your planning. Second, data needs to be stored using standardized file formats that are not tied to a specific hardware or software solution.
The software from PoINT helps researchers reliably and compliantly archive research data for the long term. It operates independent of any hardware vendor and uses standard file formats. This allows data migration to run as smoothly as possible, not just protecting data for the required length of time, but also making it available for future use.