Introduction
Metadata is a set of information that describes a document or a data source. Metadata is data about other data. In the document, "Understanding Metadata", the authors write, "Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource."(National Information Standards Organization, 2004) As indicated by the definitions, metadata can make finding information easier, it allows a party interested in the information to discover what the information is about before acquiring the information. Metadata can describe relationship that exist between different documents. Metadata may denote the format and structure that a document may take. Lastly, metadata can indicate who owns or has owned the information, who has rights to access it and who has rights to change it.
Metadata Standards
There are hundreds of metadata standards that are used by various communities or knowledge domains. This evaluation will consider some standards that are strongly used by the Library and Information Sciences communities.(Becker & Riley, 2010)
- Dublin Core
- TEI (Text Encoding Initiative)
- METS (Metadata Encoding and Transmission Standard)
- MODS (Metadata Object Description Schema)
- EAD (Encoded Archival Description)
- RDF (Resource Description Framework)
- MARC 21 (Machine Readable Cataloging Standard)
The Dublin Core Metadata Initiative constructed a shared vocabulary of terms with which to describe documents. The initial set of the Dublin Core consists of fifteen simple terms. The simple terms characterize certain properties that a document has, such as creator, title, description, subject, date, identifier and format. These fifteen simple terms are named the Dublin Core Metadata Element Set. The simple set is not bound by any machine readable markup language such as HTML, XML or TeX. Since the terms are not tied to an established implementation, they are interoperable with a variety of technologies, and they can be extended by communities by adding them to domain specific vocabularies.(CDP Metadata Working Group, 2006) The simple set has been expanded by the qualified terms. The qualified terms definition include extensions of the original 15 terms to 55 terms.(“DCMI Metadata Terms,” n.d.) The extensions are based on the original 15 but provide a narrower definition to a term.(Weibel, 1999) For example, in the simple terms, date, indicates very generally a point in time while the qualified terms narrows the use of date, or makes it more specific, with the terms dateCopyrighted and dateSubmitted.
TEI stands for the Text Encoding Initiative. The Initiative’s purpose is to establish a set of guidelines for the “encoding of machine-readable texts in the humanities and social sciences”.(“TEI: Frequently Asked Questions,” n.d.) Texts, as encoded digital representations, may originate from a diversity of sources such as from books, articles, transcribed recorded interviews or even engraved tablets. TEI is a markup language and contains elements. Elements in TEI delineate structures within the text, such as paragraphs and columns, and encompass metadata, or information, about the text. TEI is a specification that informs the user of the elements can describe a document, the ordering of the elements, and the character encoding that may be used.(“iv. About These Guidelines - TEI P5,” n.d.) TEI has a massive corpus of elements, however only a handful of them are mandatory. In reducing the number of mandatory elements, TEI conforms to a modular design principle.(ibid.) Elements are grouped into modules. Each module can be thought of as a class in a classification system. A class may contain many subclasses. Since only a few elements are mandatory, many of the classes and subclasses in the TEI are optional. If a class is optional, it is not required to be present in any instance of TEI. Using a modular design TEI allows for classes and subclasses of elements to be used for very specific purposes, such as marking up a newspaper, while allowing the exclusion of unrelated classes, such as those for marking up transcripts. In this way, TEI can be framed to a projects specific purpose.(Waltz, 2012)
METS is a metadata standard expresses structure of a digital object. It is useful for management of objects in a Digital library and the transfer of digital objects from one digital repository to another. METS assumes multiple items form a composite object. METS establishes the relation between multiple objects to one another. METS can be extended by incorporating other metadata standards thus creating more complete descriptions by including descriptive or administrative metadata schema.(McDonough, 2006) Another feature of METS is the ability to include a binary format, such as a JPG image file, directly in the structure of the metadata encoded in Base64.(ibid.) Encoding in Base64 allows for transmission of all data streams associated with the compound object without having to reference data external to METS(via HTTP or another registered URI encoded network address. Lastly, a METS document may include a pointer to certain behaviors or methods that can be applied to its enveloped data and metadata to assist in its processing.(ibid.)
MODS is Metadata Object Description Schema. MODS allows for the description of a single item. It is a reduced set of metadata taken from the MARC-21 standard and marked up in xml, though MODS and Marc21 fields are not isometric.(“MODS: Uses and Features,” n.d.) MODs is more descriptive than Dublin Core but describes less that the entire MARC21 standard. MODS is intented to be used along with METS as an extension to METS allowing the description of compound objects and singular items. MODS allows for such access points as Title, Genre, Subject, Classification and Identifier.(“Outline of elements and attributes in MODS version 3.4,” n.d.)
EAD is the Encoded Archival Description. It is a metadata standard specifically created for archives. It is used as a digital finding aid for archives.(“About EAD,” n.d.) It allows for a hierarchical description of collections, series, subseries, containers, and items. Though it can describe to a fine granularity it is mostly concerned with the gross collection rather than the each individual record. EAD is more concerned with providing a structure to the content of an archive as opposed to a mark-up standard of the content itself, or administration of a collection.(“Design Principles for Enhancements to EAD December 2002,” n.d.)
RDF (Resource Description Framework) is probably the most complex of the metadata standards. As the name suggests, RDF allows for the Description of Resources. The descriptions are expressed in Triples, a syntatic structure that allows for semantic meaning by combining a subject, predicate and object into a statement. RDF can be expressed in a variety of formats, though the most widely used is XML. The predicates of RDF usually are formulated from formal ontologies, such as Dublin Core or OAI-ORE (Open Archives Initiative Object Reuse and Exchange).
MARC 21 stands for the Machine Readable Cataloging Standard. MARC 21 is a format for bibliographic entries that computers are able to process. The standardization of MARC begain in the 1960s and MARC21 is the latest revision. The format of MARC 21 is very different from the other standards listed above. Each record has a number of fields that have subfields divided by special characters, or signposts.(“Understanding MARC Bibliographic: Parts 1 to 6,” n.d.) With the signposts, bibliographic records are able to represent subjects, author names, titles, physical descriptions, editions, publishers, etc.
Criteria
The evaluation of each standard was performed by selecting a set of criteria and assessing the conformance of the standard to the criteria. Each assessment was marked by a boolean value, or a yes or a no response. The boolean value is somewhat limiting because it does not provide any any nuance or any clarification as to the reasoning for the evaluation. However, it does provide a simple and quick manner to make a legitimate judgement.
Metadata is often divided into types of functionality. Descriptive metadata’s function is to provide descriptive summaries of the document or datasource. Administrative metadata provides information that allows for the management of a resource. Structural metadata describes how to compose an object out of multiple digital resources.
Beyond the functional types of metadata there are other features that are important for long-term viability of a standard. Extensibility is a feature that allows a standard to adapt to future needs. If a standard is extensible, then it can change to new demands made of it. Open Standard is a mark of community input and acceptance of a standard. In opposition, a proprietary standard is subject to private interests and may not meet community needs. Any standard to be considered must be actively maintained by a community, otherwise it will become obsolete due to negligence. Users of a standard should be able to verify that they have implemented a standard correctly. A schema definition allows for a parsing program to verify the conformance to any metadata record to the standard. Any standard for transmission and storage must be encoded digitally. The most widely adopted encoding standard is XML (Extensible Markup Language), however it is not the only one in use today. JSON, or JavaScript Object Notation, is a an established open encoding standard used by web developers. If a standard is independent from the manner in which it is serialized, then it can be adopted to future encoding standards more easily.
The attributes of the standard should allow for certain types of content, specifically for this implementation, though these characterizations may be broadly adopted by any physical textual content reproduction. Provenance tracking attributes allows for a description of who has had control over the production and reproduction of the content. It denotes changes to the text document as well as changes in ownership or rights holder, and movement of physical and virtual location. If a standard allows for the full text of the document to preserved within in the metadata, then it provides full text encoding of the document. It may be important to keep the text alongside the metadata given that markup,or encoding, of text is a type of metadata as well (such as font type used, bolding or italicizing of text, paragraphs, pages, etc). Technical metadata about images imparts specific data encoding, sizing, dimensions, format types about digital images. Identifier encoding allows for a range of unique strings of text to be assigned to an object as way of providing a distinct name for identification and resolution of the object.
Criteria Evaluation Matrix
Criteria
|
DC
|
TEI
|
METS
|
MODS
|
EAD
|
RDF
|
Marc 21
|
Descriptive metadata
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Administrative metadata
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Structural metadata
|
No
|
Yes
|
Yes
|
No
|
Yes
|
Yes
|
No
|
Extensible
|
Yes
|
Yes
|
Yes
|
No
|
No
|
Yes
|
No
|
Open Standard
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Actively Maintained
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Schema Definition
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
No
|
No
|
Provenance Tracking
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Text encoding
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
Technical metadata about images
|
Yes
|
Yes
|
Yes
|
No
|
No
|
Yes
|
Yes
|
Page layout
|
No
|
Yes
|
No
|
No
|
Yes
|
No
|
No
|
Identifier encoding
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
XML Only Encoding
|
No
|
Yes
|
Yes
|
Yes
|
Yes
|
No
|
No
|
Metadata Recommendations and Guidelines
As seen from the evaluation matrix, TEI is the only standard to meet all requirements. The project will use TEI as a mechanism to encode each newsletter in its entirety. However, TEI is not sufficient to provide convenient access from other tools used by the digital library community. OAI-PMH is a harvesting mechanism that digital libraries use to discover content on remote installations. It is convention that OAI-PMH lists include Dublin Core elements. Luckily, TEI is broad enough to allow for Dublin Core to be generated from TEI. Thus, any digital library implementation should have all descriptive and administrative metadata encoded in TEI and derive Dublin Core from it. Lastly, although TEI does allow for external links to be imbedded in its markup so as to recombine digital assets into a compound object (via structural metadata), it is more convenient to represent compound objects and structural metadata in METS. METS is also more broadly supported for this purpose than TEI.
In conclusion, the documents of CFRA Between the Walls will consist of images and PDF files which when taken together with TEI, DC and METS will allow for the accurate reproduction of the original newsletters. DC and TEI will provide mechanisms for full text search, basic search and advanced searching techniques. METS allows for the transfer of the digital objects should another digital library request copies or should the digital library implementation be replaced.
References
8. Customising TEI, ODD, Roma: 4. Extending TEI. (n.d.). TEI by Example. website. Retrieved November 10, 2013, from http://www.teibyexample.org/modules/TBED08v00.htm?target=extending
About EAD (EAD Official Site, Library of Congress). (n.d.). Retrieved November 17, 2013, from http://www.loc.gov/ead/eadabout.html
About These Guidelines - TEI P5: Guidelines for Electronic Text Encoding and Interchange. (n.d.). TEI <Text Incoding Initiative>. Retrieved November 18, 2013, from http://www.tei-c.org/release/doc/tei-p5-doc/en/html/AB.html
Becker, D., & Riley, J. (2010). Seeing Standards: A Visualization of the Metadata Universe. Places & Spaces: Mapping Science. Retrieved November 10, 2013, from http://scimaps.org/maps/map/seeing_standards_a_v_130/
CDP Metadata Working Group. (2006, September). CDP Dublin Core Metadata Best Practices, Version 2.1.1. Western States Digital Standards Group. Retrieved from http://www.mndigital.org/digitizing/standards/metadata.pdf
DCMI Metadata Terms. (n.d.). Retrieved November 17, 2013, from http://dublincore.org/documents/dcmi-terms/
Design Principles for Enhancements to EAD December 2002. (n.d.). EAD Official Site, Library of Congress. Retrieved November 17, 2013, from http://www.loc.gov/ead/eaddesgn.html
McDonough, J. P. (2006). METS: standardized encoding for digital library objects. International Journal on Digital Libraries, 6(2), 148–158. doi:10.1007/s00799-005-0132-1
MODS: Uses and Features (Metadata Object Description Schema: MODS). (n.d.). Retrieved November 24, 2013, from http://www.loc.gov/standards/mods/mods-overview.html
National Information Standards Organization. (2004). Understanding metadata. National Information Standards, 20.
Outline of elements and attributes in MODS version 3.4: MetadataObject Description Schema: MODS (Library of Congress). (n.d.). Retrieved November 24, 2013, from http://www.loc.gov/standards/mods/mods-outline-3-5.html
TEI: About. (n.d.). TEI <Text Incoding Initiative>. Retrieved July 9, 2012, from http://www.tei-c.org/About/
TEI: Frequently Asked Questions. (n.d.). TEI <Text Incoding Initiative>. Retrieved July 9, 2012, from http://www.tei-c.org/About/faq.xml
Understanding MARC Bibliographic: Parts 1 to 6. (n.d.). Retrieved November 30, 2013, from http://www.loc.gov/marc/umb/um01to06.html
Paragraph on TEI derived from Waltz, R. P. (2012, September 20). Text Encoding Initiative Powerpoint Presentation. Retrieved from http://figshare.com/articles/Text_Encoding_Initiative_Powerpoint_Presentation/95957
Weibel, S. (1999). The State of the Dublin Core Metadata Initiative April 1999. D-Lib Magazine, 5(4). doi:10.1045/april99-weibel
No comments:
Post a Comment