Skip to main content
 

Data Services

A Guide of Library Data Services

What is Metadata?

Metadata is often defined as "data about data".  It is also known as data documentation. Metadata is used to describe and document research data.  Metadata facilitates searching for data in an archive or repository, and makes the data easily understood by anyone who wants to use the data. It describes the who, what, when, where, how and why about data. 

There are three types of metadata; descriptive, structural, and administrative.

Descriptive Metadata supports discovery and identification.
Examples: title, creator, unique identifiers, subjects, keywords

Structural Metadata describes how a resource is organized.
Examples: Manifest of files in a dataset, table of contents, or schema of database tables

Administrative Metadata helps in managing the resource by describing technical aspects, rights management, and preservation information.
Technical Examples: file type, version information, how/when created
Rights Management Examples: licensing, use restrictions, privacy concerns
Preservation Examples: ownership, history of use, authenticity

Best Practices for Creating Metadata

  • Consult with a Data Librarian
  • Think from the perspective of the user of the resource
  • Keep metadata simple as possible to meet repository needs
  • Adopt/extend existing standards as practical and possible
  • Avoid extraneous punctuation and abbreviations
  • Keep a data dictionary

Metadata Elements

Metadata records should include:

  • Overall purpose of research

  • People involved on the research team

  • Structure of the data files and how they relate to each other

  • Research design information

  • Data collection processes and methods

  • Data processing processes and methods

  • Spatial or temporal coverage of the datasets

  • Variables used in the dataset and how they related to each other

  • Types of instruments used to collect each variable and their calibration

  • Description of any codes used for missing values

  • Explanations of any derived values

  • Explanations of errors within the data files

  • Documentation of any outliers

Standards and Schemas

Controlled Vocabularies

Vocabulary are concepts and terms in a domain.

Controlled Vocabulary indicate the same meaning for words.  It is a closed prescribed list of terms.

Taxonomy is controlled vocabulary that is arranged in a hierarchy.  Terms are not usually defined and relationships between terms are not defined.

Thesaurus is a taxonomy that contains additional information about the use of terms.  Example: MeSH

Ontology is a controlled vocabulary that defines the terms in a given domain or knowledgebase, and the relationships between those terms. Example: SNOMED

Polysemy are terms that means more than one concept.
Example- Leads [Hypertension leads to heart disease, an EKG lead, Lead poisoning]

Synonyms are terms that mean the same thing.
Example- Common cold [cold, upper respiratory infection, URI, pharyngitis, viral syndrome, bronchitis, rhinitis]

Use Cases for Controlled Vocabularies:

  1. Information capture
  2. Communication- transferring information
  3. Knowledge organization- classification of diseases
  4. Information Retrieval
  5. Decision support- implementing decision support rules

Terminologies and Vocabularies