Library Website : Research Data Management at SUNY Geneseo: Documentation

Data Documentation Overview

Proper documentation of your data allows it to be understood, interpreted, found, and cited by any user. As a researcher, it also helps you organize, track, and make sense of your data throughout the research process. Using a metadata standard helps ensure that the data is FAIR and creating a readme.txt file can also ensure that all the data is documented and organized.

Metadata is a standardized way of describing a work, including information on when and how the data was created, who created the data, and specific characteristics of the data which will better enable a researcher to find the data in the future.

In terms of data management, metadata may describe a data set in the following ways: how they were collected; when they were collected; what assumptions were made in their methodology; their geographic scope; if there are multiple files, how they relate to one another; the definitions of individual variables and, if applicable, what possible answers were (i.e., to survey questions); the calibration of any equipment used in data collection; the version of software used for analysis; etc. Very often, a data set that has no metadata is incomprehensible.

(Credit: UNC)

Types of Metadata

Metadata is often divided into categories, based on the function that it plays within an information ecosystem. The most common types of metadata are:

Descriptive	Metadata that helps provide intellectual access to an object by representing what the item is and what the item is about. Descriptive metadata is the type that is most frequently displayed to users, and enables searching and browsing of resources.
Technical	For digital objects, metadata that describes the technical aspects of the files that make up the digital objects. It may also include information about what is needed to access or interact with the object.
Preservation	Metadata about the maintenance of the physical or digital object over time, as well as preservation actions that have been taken on the object. For digital objects, preservation metadata often overlaps with technical metadata.
Structural	Metadata that reflects how different parts of an object relate to each other. For example, structural metadata would describe how multiple image files make up one digitized book.
Administrative	A broad category of metadata used to administer and manage an object. Administrative metadata may or may not be displayed to end users. For digitized materials, this often includes information about when and how the item was digitized.
Use	Metadata that provides information about who is allowed to access an item and what restrictions are placed on their use of the item. Related is rights metadata, which describes any copyright or license information relevant to the item.

FAIR Principles

The FAIR Principles were created in 2016 and published in Scientific Data as "FAIR Guiding Principles for scientific data management and stewardship." The FAIR Principles stand for Findable, Accessible, Interoperable, and Reusable.

To learn more about the FAIR Principles, visit the GO FAIR website and the guide for Preparing FAIR data for reuse and reproducibility (Research Data Management Service Group at Cornell University).

*The use of (Meta)data in the figure below and the accompanying transcription refers to both metadata and data

Chart by Cambridge Crystallographic Data Centre (CCDC)

Findable: (Meta)data are assigned a globally unique and persistent identifier. Data are described with rich metadata. Metadata clearly and explicitly include in the identifier of the data it describes. (Meta)data are registered or indexed in a searchable resource.

Accessible: (Meta)data are retrievable by their identifier using a standardized protocol. The protocol is open, free, and universal. The protocol allows for authentication and authorization, as needed. Metadata are accessible, even when the data are no longer available.

Interoperable: (Meta)data use a formal, accessible, shared, and broadly applicable language. (Meta)data use vocabularies that follow FAIR principles. (Meta)data include qualified references to other (meta)data.

Reusable: (Meta)data are richly described with a plurality of accurate and relevant attributes. (Meta)data are released with a clear and accessible data usage license. (Meta)data are associated with a detailed provenance. (Meta)data meet domain-relevant community standards.

Standards and Schema

In order for it to be useful, metadata needs to be standardized.

Standardization includes language, spelling, format for dates and numerals. Without standardization, comparing data sets can be a challenge.

Metadata schema outline the overall structure for the metadata. A metadata scheme describes how the metadata is set up, and usually addresses standards for common components of metadata such as dates, names, and places. Discipline-specific schemas are used to address special elements specific to or needed by a given discipline.

Cornell University offers helpful, more detailed information about metadata: Metadata and Describing Data, Research Data Management Service Group.

Finding and Selecting a Metadata Standard

The repository you selected to deposit your data may require a particular metadata standard.
Some metadata standards/schemas are generic, while others are domain-specific. Generic standards, such as Dublin Core, tend to be easy to use and widely adopted, but often need to be expanded to cover more specific information. Domain-specific schemas have a much richer vocabulary and structure, but tend to be highly specialized and only understandable by researchers in that area.
Consider the user. Select a scheme that will make sense for users who are most likely to access and use your data, as well as those managing and preserving your data.
If you find a metadata standard to suit your needs, use it. If you find one that is close to your needs, but not quite, you can modify it to suit your needs as long as it includes the minimum elements of title, description, format, identifier, rights holder, rights, and contact information.

Common General & Disciplinary-Specific Metadata Standards

General Metadata Standards

Dublin Core: domain agnostic, basic and widely used
MODS (Metadata Object Description Schema): descriptive metadata standard that is richer than Dublin Core, and can be used on its own or as a complement to other metadata standards

Discipline-Specific Metadata Standards

Cataloging Cultural Objects: data content standard for describing, documenting, and cataloging cultural works, including paintings, sculptures, prints, manuscripts, photographs, and other visual media.
Darwin Core: used to describe biological specimens, including their occurrence in nature as documented by observations, samples, and related information. Based on Dubln Core, this schema is used in natural history specimen collections and species observation databases
DDI (Data Documentation Initiative): common standard for social, behavioral and economic sciences, including survey data. Expressed in XML, this metadata schema supports the entire research data life cycle.
EML (Ecological Metadata Language): specific for ecology disciplines. EML is implemented as a series of XML document types that can by used in a modular and extensible manner to document ecological data.
FGDC (Federal Geographic Data Committee): specific for geospatial data.
NASA's Standard: NASA has a variety of data format and metadata standards, as well as "heritage" standards that were in use by NASA Earth Science Data Systems (ESDS) prior to the start of the legacy ESDS Standards Process Group (SPG).
OLAC (Open Language Archives Community) Metadata: developed by the Open Language Archives Community for the Open Archives initiative. It is based on Dublin Core.
TEI (Text Encoding Initiative): a standard for the representation of texts in digital form.
VRA Core (Visual Resources Association): a data standard for the description of works of visual culture as well as the images that document them.

Find Disciplinary Metadata Standards:

Disciplinary Metadata search (Digital Curation Center)
Metadata Standards Catalog (Research Data Alliance)

README Files

Text file icon A readme.txt file provides information about a data file and is meant to help anyone using or interacting with the data interpret the data as intended.

A readme.txt file is a plain text file that includes descriptive information used commonly for software, games, and code. It is a supplementary document that exists so the creator can explain the contents to the user. When working with data, it can be useful to create and include a readme.txt file with your data. This ensures that future users will understand the data, any terms, and more.

There are no standards for writing a readme.txt file, but it is recommended to include: title; principle investigator(s); dates/locations of data collection; keywords; language; funding; descriptions of each folder, file, format, data collection method, and instrument; definitions; people involved; and recommended citation.

Cornell University's guide on creating readme.txt files is easy to follow and includes numerous resources. It also includes a free template which can be downloaded and used to create your own readme.txt file.

Research Data Management at SUNY Geneseo

Data Support Services Available from the Library