CLOCKSS: Logging and Records
Contents |
CLOCKSS: Logging and Records
The CLOCKSS system uses three types of record:
- Logs: detailed logs, at an extensively customizable level of the operations of the LOCKSS daemon written to log files on the host machine. The purpose of these logs is to enable diagnosis of problems that arise. Logs are retained on the machine that generated them in /var/log.
- Alerts: messages sent off-machine by the LOCKSS daemon when significant events occur. The purpose of Alerts is to draw attention to potential problems that may need diagnosis. Alerts are sent via e-mail to the <clockss-alerts@clockss.org> mail alias, and added to the log files on the host machine via the syslog mechanism.
- Records: statistical summaries and business records of the operation of the system as a whole, not of individual boxes. They are provided to the CLOCKSS board and CLOCKSS member organizations, and electronic copies are being stored in a system run by the Executive Director.
Retention Policy
Although the LOCKSS daemon can generate extremely detailed logs, doing so routinely is counter-productive. It buries the signal in the noise. The goal of the logging and record policy, in the absence of a specific problem to diagnose, is to:
- Generate Logs adequate to, and retain them long enough to, enable simple diagnosis.
- Generate Alerts on any condition that the daemon determines is anomalous, and on other significant events, with sufficient detail to draw the system administrator's attention to problems requiring diagnosis, and to retain them indefinitely.
- Generate the Records needed for business and governance, and for monitoring of the CLOCKSS network's overall performance, and to retain them indefinitely.
Specific log retention policies for each CLOCKSS box are specified in /etc/logrotate.conf and the files in /etc/logrotate.d/. On each CLOCKSS Box:
- System logs are retained for a month.
- At least the most recent 20MB of LOCKSS daemon log data is retained.
Ingest Alerts
An Alert is generated at the end of each crawl of a SIP that meets certain criteria recording the final status of the crawl, and the number of HTTP 200 results obtained (this is equivalent to the number of new URLs which were found, plus the number of existing URLs that were found to have modified content). An example of such an alert:
Date: Sat 19 Feb 2011 04:17:24 PST From: LOCKSS box ingest2.clockss.org <clockss-alert@lockss.org> Subject: [lockss-alert] LOCKSS box info: CrawlEnd LOCKSS box 'ingest2.clockss.org' raised an alert at Sat Feb 19 04:12:24 PST 2011 Name: CrawlEnd Severity: info AU: Nature Reviews Genetics Volume 11 Explanation: Crawl ended successfully: 2276 new files Crawl ended successfully, 2276 new files, 4 warnings.
Preservation Alerts
An Alert is generated at the end of each poll that detects an integrity problem:
- If there were a non-zero number of URLs for which:
- A repair was needed because the content failed to match the consensus.
- Repair content was fetched.
- The repair content matched the consensus.
- If there were a non-zero number of URL version newly flagged as suspect because their content failed to match the locally stored hash.
An example of such an Alert:
Date: Sat Jul 20 2013 04:17:24 PST From: LOCKSS box ingest2.clockss.org <clockss-alert@lockss.org> Subject: [lockss-alert] LOCKSS box info: PollEnd LOCKSS box 'ingest2.clockss.org' raised an alert at Sat Jul 20 04:12:24 PST 2013 Name: PollEnd Severity: info AU: Nature Reviews Genetics Volume 11 Explanation: Poll ended successfully: 99.89% agreement Poll ended successfully, 2866 URLs, 99.89% agreement, 3 suspect files found.
Dissemination Alerts
The CLOCKSS archive is a dark archive; access to the content is permitted only at the direction of the CLOCKSS board. Thus, as described in CLOCKSS: Box Operations, the content access mechanisms of the LOCKSS daemon are disabled, and packet filters are used to further prevent access. Nevertheless, Alerts are generated on any access to the content in order that they may be treated as Security Alerts.
Administrative and Security Alerts
Alerts are generated on the following administrative actions:
- Changes to the configuration files.
- Changes to the access control permissions.
- Adding or de-activating an AU.
- Enabling or disabling the content servers.
- User account added or removed or password changed.
External Communications
Engagement
Engagement with harvest content publishers before ingestion is described in CLOCKSS; Ingest Pipeline.
Engagement with file transfer content publishers before ingestion is described in CLOCKSS; Ingest Pipeline.
In all cases interactions with the publisher take place through the RT ticketing system, so they are recorded permanently.
External Reports
The technology for generating reports is being revised; the earlier technology became too inefficient as the number of articles on each box grew because it generated reports on each box from the LOCKSS: Metadata Database then merged them. The new technology is a centralized database with a row for each article, a column for each of the production and ingest boxes, and the cell containing the ingest timestamp of the article on that box, obtained by a regular polling process that asks each box for the articles ingested since the last time it was asked.
The following reports are generated for external consumption:
- Monthly reports of the state of preservation of all serials committed to preservation in the CLOCKSS archive are delivered to the CLOCKSS board, the Keepers Registry and posted on the Web.
- KBART reports, used to update link resolver knowledge bases, are generated monthly and posted on the Web.
- The CLOCKSS Executive Director is sent an e-mail report of the article counts in the CLOCKSS archive weekly. These reports are preserved in Stanford's backup system.
- The CLOCKSS archive charges publishers a small fee for each current article ingested, billed quarterly. Thus a quarterly report is generated showing for each publisher the number of their articles ingested in that quarter for each publication year. The report is submitted to the CLOCKSS Executive Director for onward transmission to the publishers. Significant discrepancies between this and the publisher's own article counts will result (and have resulted) in investigation and corrective action. To aid in this process more detailed reports, down to the article level, can be generated on request.
The CLOCKSS Metadata Lead is responsible for the production and dissemination of these reports.
Monitoring
Log Monitoring
- Ingest boxes: The CLOCKSS Content Lead is responsible for monitoring logs on the ingest boxes
- Production boxes: The LOCKSS Technical Lead is responsible for monitoring logs on production boxes when needed.
- Web servers: The CLOCKSS Network Administrator is responsible for monitoring web server logs.
Alert Monitoring
The LOCKSS Technical Lead is responsible for monitoring the Alerts generated by CLOCKSS boxes.
Nagios
The state of the CLOCKSS infrastructure, including the CLOCKSS boxes and the ingest machines, is monitored by Nagios as described in CLOCKSS: Box Operations.
The CLOCKSS Network Administrator is responsible for monitoring via Nagios.
Network Diagnostics
The LOCKSS team's internal monitoring and evaluation processes identified some areas in which the efficiency of the polling process could be improved in the context of the Global LOCKSS Network (GLN). The Andrew W. Mellon Foundation funded work to implement and evaluate improvements in these areas. This is expected to be complete by March 2014. Although these improvements will be deployed to the CLOCKSS network, because there are many fewer boxes in the CLOCKSS network than the GLN the areas of inefficiency are not relevant to the CLOCKSS network. Thus the improvements are not expected to make a substantial difference to the performance of the CLOCKSS network.
The Mellon-funded work included development of improved instrumentation and analysis software, which polls the administrative Web UI of each LOCKSS box in a network to collect vast amounts of data about the operations of each box. For examples of the use of this software, see LOCKSS: Polling and Repair Protocol.
The CLOCKSS Network Administrator is responsible for collecting and analyzing this data.
Change Process
Changes to this document require:
- Review by:
- LOCKSS Engineering Staff
- CLOCKSS Network Administrator
- Approval by LOCKSS Technical Lead