Proper documentation of your data allows it to be understood, interpreted, found, and cited by any user. As a researcher, it also helps you organize, track, and make sense of your data throughout the research process. Using a metadata standard helps ensure that the data is FAIR and creating a readme.txt file can also ensure that all the data is documented and organized.
Metadata is a standardized way of describing a work, including information on when and how the data was created, who created the data, and specific characteristics of the data which will better enable a researcher to find the data in the future.
In terms of data management, metadata may describe a data set in the following ways: how they were collected; when they were collected; what assumptions were made in their methodology; their geographic scope; if there are multiple files, how they relate to one another; the definitions of individual variables and, if applicable, what possible answers were (i.e., to survey questions); the calibration of any equipment used in data collection; the version of software used for analysis; etc. Very often, a data set that has no metadata is incomprehensible.
(Credit: UNC)
Metadata is often divided into categories, based on the function that it plays within an information ecosystem. The most common types of metadata are:
Descriptive |
Metadata that helps provide intellectual access to an object by representing what the item is and what the item is about. Descriptive metadata is the type that is most frequently displayed to users, and enables searching and browsing of resources. |
Technical | For digital objects, metadata that describes the technical aspects of the files that make up the digital objects. It may also include information about what is needed to access or interact with the object. |
Preservation | Metadata about the maintenance of the physical or digital object over time, as well as preservation actions that have been taken on the object. For digital objects, preservation metadata often overlaps with technical metadata. |
Structural | Metadata that reflects how different parts of an object relate to each other. For example, structural metadata would describe how multiple image files make up one digitized book. |
Administrative | A broad category of metadata used to administer and manage an object. Administrative metadata may or may not be displayed to end users. For digitized materials, this often includes information about when and how the item was digitized. |
Use | Metadata that provides information about who is allowed to access an item and what restrictions are placed on their use of the item. Related is rights metadata, which describes any copyright or license information relevant to the item. |
The FAIR Principles were created in 2016 and published in Scientific Data as "FAIR Guiding Principles for scientific data management and stewardship." The FAIR Principles stand for Findable, Accessible, Interoperable, and Reusable.
To learn more about the FAIR Principles, visit the GO FAIR website and the guide for Preparing FAIR data for reuse and reproducibility (Research Data Management Service Group at Cornell University).
*The use of (Meta)data in the figure below and the accompanying transcription refers to both metadata and data
Findable: (Meta)data are assigned a globally unique and persistent identifier. Data are described with rich metadata. Metadata clearly and explicitly include in the identifier of the data it describes. (Meta)data are registered or indexed in a searchable resource.
Accessible: (Meta)data are retrievable by their identifier using a standardized protocol. The protocol is open, free, and universal. The protocol allows for authentication and authorization, as needed. Metadata are accessible, even when the data are no longer available.
Interoperable: (Meta)data use a formal, accessible, shared, and broadly applicable language. (Meta)data use vocabularies that follow FAIR principles. (Meta)data include qualified references to other (meta)data.
Reusable: (Meta)data are richly described with a plurality of accurate and relevant attributes. (Meta)data are released with a clear and accessible data usage license. (Meta)data are associated with a detailed provenance. (Meta)data meet domain-relevant community standards.
Standardization includes language, spelling, format for dates and numerals. Without standardization, comparing data sets can be a challenge.
Metadata schema outline the overall structure for the metadata. A metadata scheme describes how the metadata is set up, and usually addresses standards for common components of metadata such as dates, names, and places. Discipline-specific schemas are used to address special elements specific to or needed by a given discipline.
Cornell University offers helpful, more detailed information about metadata: Metadata and Describing Data, Research Data Management Service Group.
A readme.txt file provides information about a data file and is meant to help anyone using or interacting with the data interpret the data as intended.
A readme.txt file is a plain text file that includes descriptive information used commonly for software, games, and code. It is a supplementary document that exists so the creator can explain the contents to the user. When working with data, it can be useful to create and include a readme.txt file with your data. This ensures that future users will understand the data, any terms, and more.
There are no standards for writing a readme.txt file, but it is recommended to include: title; principle investigator(s); dates/locations of data collection; keywords; language; funding; descriptions of each folder, file, format, data collection method, and instrument; definitions; people involved; and recommended citation.
Cornell University's guide on creating readme.txt files is easy to follow and includes numerous resources. It also includes a free template which can be downloaded and used to create your own readme.txt file.
Geneseo Authors Hall preserves over 90 years of scholarly works.
KnightScholar facilitates creation of works by the SUNY Geneseo community.
IDS Project is a resource-sharing cooperative.