4.2.5.1 Tools or methods to identify the file types of all objects

From CLOCKSS Trusted Digital Repository Documents
Jump to: navigation, search

4.2.5.1 - The repository shall have tools or methods to identify the file type of all submitted Data Objects.

The CLOCKSS archive implements two independent techniques for identifying the file type of content objects:

  • Using the Web infrastructure.
  • Using file identification tools.

Using the Web Infrastructure

As described in Definition of AIP web browsers use the HTTP headers, and "magic number" and other information embedded in the content to identify the file type of content delivered to them for rendering. Definition of AIP documents how this information is collected during creation of each version of a content object within a CLOCKSS AU, and how it is stored in persistent association with the content to which it refers.

Using File Identification Tools

As described in LOCKSS: Format Migration, the LOCKSS software used by the CLOCKSS archive integrates the File Identification Tool Set (FITS), which includes JHOVE, DROID and other tools. Various outputs from FITS are available through the LOCKSS daemon's GUI for every URL in an AU.

References

  1. Definition of AIP
  2. LOCKSS: Format Migration
  3. File Identification Tool Set https://code.google.com/p/fits/ accessed 2013.8.7