Skip to Main Content

Research Data Management at SUNY Geneseo

Data management includes the processes of collecting, organizing, describing, sharing, and preserving data. This guide will help you plan for and prepare a data management plan and learn about the data lifecycle and research data management.

Data Storage Overview

Storing is a key component of the Data Lifecycle touching on every stage. Researchers will need to plan for how data sets will be stored during active working phases (including backups) and for long-term retention, retention polices set by funder or institution, data security, and related costs.

Sharing and publishing the data will bring another set of criteria to consider. Sharing data makes it possible for researchers to validate research results, to reuse data for teaching and further research, and can increase the impact of that research (Piwowar 2007). Sharing is also required by an increasing number of funders and publishers who seek to maximize the impact of research, ensure results are reproducible, and that sufficient information is included for the scholarly record. 

This page gives an overview of:

  • Tips for storage & preservation
  • Storage & repository options available to the Geneseo community
  • Guidance for selecting the most appropriate storage & repository for your needs

Sharing detailed research data is associated with increased citation rate. Heather A. Piwowar, Roger S. Day, Douglas D. Fridsma. PLoS ONE 2(3):e308. 2007. https://dx.doi.org/doi:10.1371/journal.pone.0000308.

Content adapted from RDMS Cornell University which is licensed under a CC BY License.

Preservation Considerations

"Old Files" by xkcd is licensed under a CC-BY-NC license.

Here are some questions to consider as you make plans to preserve your data.

  • What is your plan to preserve the integrity of your data over time? Is it required for your grant?
  • What departmental, institutional, disciplinary, or programmatic policies on data retention exist and how will they impact your preservation plans?
  • What is the length of time the data is required to be retained and why? (e.g. buying a piece of equipment with a grant may require retention of all output for 3 or 5 years. You may want to have a plan for deciding what output is retained after the initial time period and whether or not you will keep the same structures.)
  • Consider what data are needed to validate the research, what data directly support publications based on the research, and what data have the greatest potential for reuse.
  • What hardware or campus or commercial services need to be used to assure data preservation?
  • What are the associated costs for these activities or services?

Storage Options at SUNY Geneseo

Tips for storage

A recommended practice is to keep at least three copies of your data: 1) "here" - a local copy on your lap- or desktop, where the files were created or collected, 2) "near" - an external copy on a different media type than the original and 3) "far" - an external copy in a geographically different location, such as a cloud storage service.
This is also called the Rule of Three: THREE copies, on at least TWO different media types, with ONE copy in an entirely different location.
(I.e., not in the same building, or, depending on your situation and needs, the same part of the country. This third copy would be invaluable in case of environmental risks, such as damage due to fire or water.)

Remember that not all media is appropriate for long-term storage. Mechanical hard disk drives (HDD) have an average life-span of just 4-6 years. Memory sticks are convenient, but are easily lost or stolen. Solid-state drives (SSD) lose charge if left for long periods of time without power.

Read and understand your cloud storage terms of service. In what situations can they close your account? For how long can you restore deleted content? How many versions of your data can you restore? Is it based on number of versions, or how long since you last accessed them?

When selecting your storage tool(s), consider things like how much data you have (size and number of files), who you need to share with, your budget, how long you will need that type of storage, if you have special replication, performance or security needs, or if you need to hold data that requires restricted access or is under HIPAA or export control regulations.

 

Cloud by xkcd is licensed under a CC-BY-NC License

Content adapted from RDMS Cornell University which is licensed under a Creative Commons Attribution 4.0 International License.

Repository Options with SUNY Geneseo and SUNY

KnightScholar logo

KnightScholar Repository is the institutional repository for SUNY Geneseo. KnightScholar is a discipline agnostic platform that can store and provide open access to data files. KnightScholar Repository is supported by the College Library. Contact the KnightScholar Services Team for more information at knightscholar@geneseo.edu.

 

Dryad logoDryad is a discipline agnostic platform and non-for-profit data community managed by the California Digital Library. SUNY is currently in a pilot agreement with Dryad. Using their ORCID ID, SUNY researchers can deposit their data into Dryad for in-perpetuity at no cost. Published data is automatically assigned a. DOI.

Search for Data Repositories

  • Registry of Research Data Repositories (Re3data): Re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines. It includes repositories that enable permanent storage of and access to data sets to researchers, funding bodies, publishers, and scholarly institutions.
  • List of NIH-Approved Repositories: List of subject-specific and generalist repositories the NIH approved as being suitable places to share data.  
  • OpenDOAR: OpenDOAR is the quality-assured, global Directory of Open Access Repositories. They host repositories that provide free, open access to academic outputs and resources
  • Open Access List of Data Repositories: This is a list of open data repositories, separated by subject. These subjects are: Archaeology, Astronomy, Biology, Chemistry, Computer Science, Energy, Engineering, Environmental Sciences, Geology, Geosciences and Geospatial Data, Linguistics, Marine Sciences, Medicine, Multidisciplinary Repositories, Physics, and Social Sciences. 
  • FAIRsharing: A curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies. It has a searchable list of data repositories. 
  • DataOne: DataONE is a community driven program providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data.