Document your Corpus

text with annotations

Metadata, or information that describes the files in your corpus, should accompany every corpus so that people external to your project understand why and how you created the corpus.

No matter how well you think you know the material you have collected it is very important that all details and characteristics of the corpus you have collected are carefully collected and stored together with the data themselves.

 

 

What are the metadata

"Metadata is a love note to the future" by Cea.
"Metadata is a love note to the future" by Cea.

 

 

Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:

  • Descriptive metadata — the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords.
  • Structural metadata — metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships and other characteristics of digital materials.
  • Administrative metadata — the information to help manage a resource, like type, permissions, and when and how it was created.
  • Reference metadata — the information about the contents and quality of statistical data.
  • Statistical metadata, also called process data, may describe processes that collect, process or produce statistical data.
  • Legal metadata — provides information about the creator, copyright holder, and public licensing, if provided.

Metadata is not strictly bound to one of these categories, as it can describe a piece of data in many other ways. (From Wikipedia)