LOCKSS: Metadata Database

From CLOCKSS Trusted Digital Repository Documents
Jump to: navigation, search

LOCKSS: Metadata Database

The LOCKSS technology extracts bibliographic and other metadata from the preserved content and indexes it in a relational database. This database is not itself preserved, it is merely a cache of metadata extracted from the preserved content to facilitate access and management. The indexed metadata can be used to locate and retrieve information about articles, chapters, and other features of the preserved content. This document describes how this information is represented in the database tables.

Information Architecture of Stored Metadata

The metadata database stores metadata information about preserved content at the level of bibliographic units. For periodicals such as journals, the bibliographic units are journal articles. For books, bibliographic units are either individual book chapters or an entire book if chapters are not delivered individually. Bibliographic units for other types of content can be identified and represented at the same level. Each bibliographic unit is represented by descriptive information such as authors, title (e.g. of the article or chapter), keywords, and abstract. Identifying information about the bibliographic unit is also stored, including the Digital Object Identifier (DOI), ISSN, ISBN.

Information about relationships among bibliographic units is also stored in the metadata database. For example, a journal article is usually part of an issue, which is located at certain page numbers in the issue or as a certain article number. The issue is published as a certain issue number within a certain volume of a journal from a publisher. Similarly, a book chapter or article is part of a book or monograph, located at a certain chapter number, which may be part of a book or monographic series from a publisher. The metadata database stores relationship information that enables bibliographic units to be located with respect to these relationships. For example, all articles of an issue, all issues of a volume, all volumes of a journal, and all journals by a publisher can be identified. This enables reporting on, rendering, and browsing content in a way that reflects these relationships.

Finally, the metadata database stores information about the membership of each bibliographic unit within a specific Archive Information Package (AIP), including the identifier of the AIP (the AUID), and the URI of the unit on the Provider's website in the case of harvested content, or relative to a delivered Submission Information Package (SIP) in the case of file transfer content. This information is sufficient to locate the bibliographic unit within the LOCKSS repository.

Schema Representation

The schema used to represent bibliographic units, their bibliographic relationships, and their preservation information is encoded as tables in a relational model.

Representation of Bibliographic Units

A bibliographic unit is primarily represented by a MD_ITEM table that is decorated by a number of supporting tables with different types of bibliographic information, These tables include:

  • MD_ITEM_NAME -- the name of the bibliographic unit (e.g. article or book chapter title)
  • DOI -- the DOI of the bibliographic unit (e.g. article or book chapter DOI)
  • AUTHOR -- the authors of the unit (e.g. article or book chapter author)
  • KEYWORD -- the keywords for the unit (e.g. article or book chapter keywords)

Relationships Among Bibliographic Units

Relationships among bibliographic units are represented by supporting tables that decorate the MD_ITEM with intermediate levels of containment information:

  • BIB_ITEM -- the intermediate bibliographic information (e.g. volume, issue, start page)

Relationships among bibliographic units are also represented by a parent MD_ITEM that represents the containing publication. This parent MD_ITEM is further decorated by supporting tables with publication information. These tables include:

  • MD_ITEM_NAME -- the name of the publication (e.g. journal or book title)
  • ISSN -- the print or online ISSN of a periodical
  • ISBN -- the print or online ISBN of a book
  • PUBLICATION -- the publication record
  • PUBLISHER -- The publisher for a given publication

An additional level of MD_ITEM is also used to represent a book or monographic series. This grandparent MD_ITEM is further decorated by supporting tables with publication information. These tables include:

  • ISSN -- the print or online ISSN of a book series

Membership of a Bibliographic Unit in an AIP (AU)

Membership of bibliographic units in an AIP (AU) is represented by supporting tables that decorate the MD_ITEM. These tables include:

  • AU_MD -- the membership record of a bibliographic unit in an AU
  • AU -- the AU information including AUID and plugin
  • PLUGIN -- the plugin information including the plugin ID and publishing platform
  • PLATFORM -- the publishing platform

Membership is also represented by tables that decorate the MD_ITEM of the bibliographic unit:

  • URL -- the original URL of the bibliographic unit at the Provider (e.g. the article or book chapter URL)

The plugin, AU key, and original URL of the bibliographic unit are sufficient to locate the bibliographic unit within its AIP and identify the SIP from which it came in the LOCKSS repository

Graphical Representation of Schema

A graphical representation is automatically generated from the metadata database schema.

LOCKSS Metadata Database Schema Diagram

Change Process

Changes to this document require:

  • Review by LOCKSS Technical Staff
  • Approval by LOCKSS Technical Lead

Relevant Documents

  1. LOCKSS: Extracting Bibliographic Metadata
  2. Definition of SIP