Difference between revisions of "CLOCKSS: Logging and Records"
(Changes after auditor visit: approved by Tom Lipkis) |
(restoring post-2016 edits) |
||
Line 173: | Line 173: | ||
=== External Reports === | === External Reports === | ||
− | The technology for generating reports is being revised; the | + | The technology for generating reports is being revised; the current technology is becoming too inefficient as the number of articles on each box grows because it generates reports on each box from the [[LOCKSS: Metadata Database]] then merges them. The new technology is a composite database with data synchronized from one or more preservation boxes' metadata databases. In addition to tracking individual article metadata, the consolidated database will track a per-machine ingest timestamp of the article on that box and will support de-duplication based on explicitly-defined rules. |
The following reports are generated for external consumption: | The following reports are generated for external consumption: | ||
Line 179: | Line 179: | ||
* KBART reports are generated monthly and posted [http://www.clockss.org/kbart/ on the Web]. For the Global LOCKSS Network, these reports are used to update link resolver knowledge bases so that libraries can provide their readers access to the content of their LOCKSS box. Because the CLOCKSS archive is a dark archive, these reports cannot be used to update link resolvers. However, several analysis tools use KBART as an input format, so the KBART reports for CLOCKSS are made public. | * KBART reports are generated monthly and posted [http://www.clockss.org/kbart/ on the Web]. For the Global LOCKSS Network, these reports are used to update link resolver knowledge bases so that libraries can provide their readers access to the content of their LOCKSS box. Because the CLOCKSS archive is a dark archive, these reports cannot be used to update link resolvers. However, several analysis tools use KBART as an input format, so the KBART reports for CLOCKSS are made public. | ||
* The CLOCKSS Executive Director is sent an e-mail report of the article counts in the CLOCKSS archive weekly. These reports are preserved in Stanford's backup system. | * The CLOCKSS Executive Director is sent an e-mail report of the article counts in the CLOCKSS archive weekly. These reports are preserved in Stanford's backup system. | ||
− | * The CLOCKSS | + | * The CLOCKSS Archive charges publishers a small fee for each current article ingested, billed quarterly. Thus a semi-annual report is generated showing for each publisher the number of their articles ingested in that quarter for each publication year. The report is submitted to the CLOCKSS Executive Director for onward transmission to the publishers. Significant discrepancies between this and the publisher's own article counts will result (and have resulted) in investigation and corrective action. To aid in this process more detailed reports, down to the article level, can be generated on request. |
The CLOCKSS Metadata Lead is responsible for the production and dissemination of these reports. | The CLOCKSS Metadata Lead is responsible for the production and dissemination of these reports. | ||
Line 187: | Line 187: | ||
=== Log Monitoring === | === Log Monitoring === | ||
− | * Ingest boxes: The CLOCKSS | + | * Ingest boxes: The CLOCKSS Technical Lead is responsible for monitoring logs on the ingest boxes |
* Production boxes: The CLOCKSS Technical Lead is responsible for monitoring logs on production boxes when needed. | * Production boxes: The CLOCKSS Technical Lead is responsible for monitoring logs on production boxes when needed. | ||
* Web servers: The CLOCKSS Network Administrator is responsible for monitoring web server logs. | * Web servers: The CLOCKSS Network Administrator is responsible for monitoring web server logs. | ||
Line 214: | Line 214: | ||
* Review by: | * Review by: | ||
** LOCKSS Engineering Staff | ** LOCKSS Engineering Staff | ||
− | ** CLOCKSS | + | ** CLOCKSS Technical Lead |
− | * Approval by CLOCKSS | + | * Approval by CLOCKSS Network Administrator |
== Relevant Documents == | == Relevant Documents == |
Latest revision as of 22:04, 14 August 2019
Contents |
CLOCKSS: Logging and Records
The CLOCKSS system uses three types of record:
- Logs: detailed logs, at an extensively customizable level of the operations of the LOCKSS daemon written to log files on the host machine. The purpose of these logs is to enable diagnosis of problems that arise. Logs are retained on the machine that generated them in /var/log.
- Alerts: messages sent off-machine by the LOCKSS daemon when significant events occur. The purpose of Alerts is to draw attention to potential problems that may need diagnosis. Alerts are sent via e-mail to the <clockss-alerts> mail alias, and added to the log files on the host machine via the syslog mechanism.
- Records: statistical summaries and business records of the operation of the system as a whole, not of individual boxes. They are provided to the CLOCKSS board and CLOCKSS member organizations, and electronic copies are being stored in a system run by the Executive Director.
Retention Policy
Although the LOCKSS daemon can generate extremely detailed logs, doing so routinely is counter-productive. It buries the signal in the noise. The goal of the logging and record policy, in the absence of a specific problem to diagnose, is to:
- Generate Logs adequate to, and retain them long enough to, enable simple diagnosis.
- Generate Alerts on any condition that the daemon determines is anomalous, and on other significant events, with sufficient detail to draw the system administrator's attention to problems requiring diagnosis, and to retain them indefinitely.
- Generate the Records needed for business and governance, and for monitoring of the CLOCKSS network's overall performance, and to retain them indefinitely.
Specific log retention policies for each CLOCKSS box are specified in /etc/logrotate.conf and the files in /etc/logrotate.d/. On each CLOCKSS Box:
- System logs are retained for a month.
- At least the most recent 20MB of LOCKSS daemon log data is retained.
Ingest Alerts
An Alert is generated at the end of each crawl of a SIP that meets certain criteria recording the final status of the crawl, and the number of HTTP 200 results obtained (this is equivalent to the number of new URLs which were found, plus the number of existing URLs that were found to have modified content). An example of such an alert:
Date: Sat 19 Feb 2011 04:17:24 PST From: LOCKSS box ingest2.clockss.org <clockss-alert@xxx.xxx> Subject: [lockss-alert] LOCKSS box info: CrawlEnd LOCKSS box 'ingest2.clockss.org' raised an alert at Sat Feb 19 04:12:24 PST 2011 Name: CrawlEnd Severity: info AU: Nature Reviews Genetics Volume 11 Explanation: Crawl ended successfully: 2276 new files Crawl ended successfully, 2276 new files, 4 warnings.
Here is an example failed crawl alert from an ingest box:
From: LOCKSS box ingest1.clockss.org <clockss-alert@xxx.xxx> To: clockss-alert@xxx.xxx Date: Thu 20 Mar 2014 21:50:18 PDT Subject: [clockss-alert] LOCKSS box warning: CrawlFailed LOCKSS box 'ingest1.clockss.org' raised an alert at Thu Mar 20 21:45:18 PDT 2014 Name: CrawlFailed Severity: warning AU: Journal of Pharmacology and Experimental Therapeutics Volume 346 Explanation: Crawl finished with error: Can't fetch permission page: 0 files fetched, 0 warnings, 1 error
Preservation Alerts
An Alert is generated at the end of each poll that detects an integrity problem:
- If there were a non-zero number of URLs for which:
- A repair was needed because the content failed to match the consensus.
- Repair content was fetched.
- The repair content failed to match the consensus.
- If there were a non-zero number of URL version newly flagged as suspect because their content failed to match the locally stored hash.
Here is an example alert caused by injection of a failure of a repair to match the consensus during testing in the STF test environment:
From: LOCKSS box quark <xxx@xxx.xxx> To: clockss-alert@xxx.xxx Date: Thu 20 Mar 2014 22:50:33 PDT Subject: LOCKSS box warning: PersistentDisagreement LOCKSS box 'quark' raised an alert at Thu Mar 20 22:50:33 PDT 2014 Name: PersistentDisagreement Severity: warning AU: Simulated Content: simContent AUID: org|lockss|plugin|simulated|SimulatedPlugin&root~simContent Explanation: Poll did not achieve consensus on all files 21 URLs tallied, 95.23% agreement 1 repair received, 0 not received. 1 repair didn't resolve disagreement: http://www.example.com/003file.txt
Dissemination Alerts
The CLOCKSS archive is a dark archive; access to the content is permitted only at the direction of the CLOCKSS board. Thus, as described in CLOCKSS: Box Operations, the content access mechanisms of the LOCKSS daemon are disabled, and packet filters are used to further prevent access. Nevertheless, Alerts are generated on any access to the content in order that they may be treated as Security Alerts.
Here is a sample access alert from an ingest box. These accesses are expected as they come from production boxes crawling the ingest box; the alerts were turned on briefly as a test but would normally be disabled.
From: LOCKSS box ingest1.clockss.org <clockss-alert@xxx.xxx> To: clockss-alert@xxx.xxx Date: Sun 05 Jan 2014 08:42:42 PST Subject: LOCKSS box info: ContentAccess (multiple) LOCKSS box 'ingest1.clockss.org' raised an alert at Sun Jan 05 08:12:31 PST 2014 Name: ContentAccess Severity: info Explanation: Proxy access: http://www.nature.com/rj/style/group.css : 200 from cache in 398ms ========================================================================== LOCKSS box 'ingest1.clockss.org' raised an alert at Sun Jan 05 08:12:33 PST 2014 Name: ContentAccess Severity: info Explanation: Proxy access: http://www.nature.com/nbt/journal/v29/n3/abs/nbt.1829.html : 200 from cache in 243ms ========================================================================== LOCKSS box 'ingest1.clockss.org' raised an alert at Sun Jan 05 08:12:33 PST 2014 Name: ContentAccess Severity: info Explanation: Proxy access: http://www.nature.com/nbt/journal/v29/n3/abs/nbt.1829.html : 200 from cache in 1ms ========================================================================== LOCKSS box 'ingest1.clockss.org' raised an alert at Sun Jan 05 08:12:33 PST 2014 Name: ContentAccess Severity: info Explanation: Proxy access: http://www.nature.com/nbt/journal/v29/n3/abs/nbt.1829.html : 200 from cache in 1ms ========================================================================== LOCKSS box 'ingest1.clockss.org' raised an alert at Sun Jan 05 08:12:33 PST 2014 Name: ContentAccess Severity: info Explanation: Proxy access: http://www.nature.com/nbt/journal/v29/n3/abs/nbt.1829.html : 200 from cache in 0ms ========================================================================== LOCKSS box 'ingest1.clockss.org' raised an alert at Sun Jan 05 08:12:34 PST 2014 Name: ContentAccess Severity: info Explanation: Proxy access: http://www.nature.com/nbt/journal/v29/n9/covers/index.html : 200 from cache in 157ms ========================================================================== LOCKSS box 'ingest1.clockss.org' raised an alert at Sun Jan 05 08:12:34 PST 2014 Name: ContentAccess Severity: info Explanation: Proxy access: http://www.nature.com/nbt/journal/v29/n9/covers/index.html : 200 from cache in 0ms ========================================================================== LOCKSS box 'ingest1.clockss.org' raised an alert at Sun Jan 05 08:12:35 PST 2014 Name: ContentAccess Severity: info Explanation: Proxy access: http://www.nature.com/nbt/journal/v29/n9/covers/index.html : 200 from cache in 0ms ========================================================================== LOCKSS box 'ingest1.clockss.org' raised an alert at Sun Jan 05 08:12:36 PST 2014 Name: ContentAccess Severity: info Explanation: Proxy access: http://www.nature.com/nbt/journal/v29/n9/covers/index.html : 200 from cache in 0ms
Administrative and Security Alerts
Alerts are generated on the following administrative actions:
- Changes to the configuration files.
- Changes to the access control permissions.
- Adding or de-activating an AU.
- Enabling or disabling the content servers.
- User account added or removed or password changed.
External Communications
Engagement
Engagement with harvest content publishers before ingestion is described in CLOCKSS; Ingest Pipeline.
Engagement with file transfer content publishers before ingestion is described in CLOCKSS; Ingest Pipeline.
In all cases interactions with the publisher take place through the RT ticketing system, so they are recorded permanently.
External Reports
The technology for generating reports is being revised; the current technology is becoming too inefficient as the number of articles on each box grows because it generates reports on each box from the LOCKSS: Metadata Database then merges them. The new technology is a composite database with data synchronized from one or more preservation boxes' metadata databases. In addition to tracking individual article metadata, the consolidated database will track a per-machine ingest timestamp of the article on that box and will support de-duplication based on explicitly-defined rules.
The following reports are generated for external consumption:
- Monthly reports of the state of preservation of all serials committed to preservation in the CLOCKSS archive are delivered to the CLOCKSS board, the Keepers Registry and posted on the Web.
- KBART reports are generated monthly and posted on the Web. For the Global LOCKSS Network, these reports are used to update link resolver knowledge bases so that libraries can provide their readers access to the content of their LOCKSS box. Because the CLOCKSS archive is a dark archive, these reports cannot be used to update link resolvers. However, several analysis tools use KBART as an input format, so the KBART reports for CLOCKSS are made public.
- The CLOCKSS Executive Director is sent an e-mail report of the article counts in the CLOCKSS archive weekly. These reports are preserved in Stanford's backup system.
- The CLOCKSS Archive charges publishers a small fee for each current article ingested, billed quarterly. Thus a semi-annual report is generated showing for each publisher the number of their articles ingested in that quarter for each publication year. The report is submitted to the CLOCKSS Executive Director for onward transmission to the publishers. Significant discrepancies between this and the publisher's own article counts will result (and have resulted) in investigation and corrective action. To aid in this process more detailed reports, down to the article level, can be generated on request.
The CLOCKSS Metadata Lead is responsible for the production and dissemination of these reports.
Monitoring
Log Monitoring
- Ingest boxes: The CLOCKSS Technical Lead is responsible for monitoring logs on the ingest boxes
- Production boxes: The CLOCKSS Technical Lead is responsible for monitoring logs on production boxes when needed.
- Web servers: The CLOCKSS Network Administrator is responsible for monitoring web server logs.
Alert Monitoring
The CLOCKSS Technical Lead is responsible for monitoring the Alerts generated by CLOCKSS boxes.
Nagios
The state of the CLOCKSS infrastructure, including the CLOCKSS boxes and the ingest machines, is monitored by Nagios as described in CLOCKSS: Box Operations.
The CLOCKSS Network Administrator is responsible for monitoring via Nagios.
Network Diagnostics
The LOCKSS team's internal monitoring and evaluation processes identified some areas in which the efficiency of the polling process could be improved in the context of the Global LOCKSS Network (GLN). The Andrew W. Mellon Foundation funded work to implement and evaluate improvements in these areas. This is expected to be complete by March 2015. Although these improvements will be deployed to the CLOCKSS network, because there are many fewer boxes in the CLOCKSS network than the GLN the areas of inefficiency are not relevant to the CLOCKSS network. Thus the improvements are not expected to make a substantial difference to the performance of the CLOCKSS network.
The Mellon-funded work included development of improved instrumentation and analysis software, which polls the administrative Web UI of each LOCKSS box in a network to collect vast amounts of data about the operations of each box. For examples of the use of this software, see LOCKSS: Polling and Repair Protocol.
The CLOCKSS Network Administrator is responsible for collecting and analyzing this data.
Change Process
Changes to this document require:
- Review by:
- LOCKSS Engineering Staff
- CLOCKSS Technical Lead
- Approval by CLOCKSS Network Administrator