Difference between revisions of "LOCKSS: Polling and Repair Protocol"

From CLOCKSS Trusted Digital Repository Documents
Jump to: navigation, search
(Initial version)
(Changes after auditor visit: approved by David Rosenthal)
Line 3: Line 3:
 
== Overview ==
 
== Overview ==
  
As described in [[CLOCKSS: Box Operations]] the CLOCKSS boxes are configured to form a Private LOCKSS Network (PLN). The boxes run the LOCKSS polling and repair protocol as described in [http://dx.doi.org/10.1145/1047915.1047917 our ''ACM Transactions on Computer Systems'' paper] with the following modifications:
+
LOCKSS boxes run the LOCKSS polling and repair protocol as described in [http://dx.doi.org/10.1145/1047915.1047917 our ''ACM Transactions on Computer Systems'' paper]. The paper describes the polling mechanism as applying to a single file; the [[LOCKSS: Basic Concepts#LOCKSS Daemon|LOCKSS daemon]] applies it to an entire [[LOCKSS: Basic Concepts#Archival Unit|Archival Unit (AU)]] of content. Each LOCKSS daemon chooses at random the next AU upon which it will use the LOCKSS polling and repair protocol to perform integrity checks. It acts as the ''poller'' to call a poll on that AU by:
* Because the CLOCKSS PLN is closed network secured by SSL certificate checks at both ends of all connections, the defenses against sybil attacks, which involve the adversary creating new peer identities, are not necessary and are not implemented.
+
* The efficiency enhancements described below are being deployed to the CLOCKSS PLN.
+
The LOCKSS polling and repair protocol performs regular integrity checks on each AU at each CLOCKSS box (the ''poller'') by:
+
 
* Selecting a random sample of the other CLOCKSS boxes (the <i>voters</i>).
 
* Selecting a random sample of the other CLOCKSS boxes (the <i>voters</i>).
* Inviting the voters to participate in a <i>poll</i> on the AU.
+
* Inviting the voters to participate in a <i>poll</i> on the AU, and sending each of them a freshly-generated random nonce ''Np''.
* The poll involves the voters voting, using a procedure based on nonced cryptographic hashes, on the content of each URL in their copy of the AU.
+
* The poll involves the voters voting by:
 +
** Generating a fresh random nonce ''Nv''.
 +
** Creating a vote containing, for every URL in the voter's instance of the AU:
 +
*** The URL
 +
*** The hash of the concatenation of ''Np'', ''Nv'' and the content of the URL.
 +
** Sending the vote to the poller. Note that the vote contains a hash for each URL in the voter's instance of the AU, but that hash is not the hash of the content. The nonces ensure that the hash in the vote is different for every vote in every poll. The voter cannot simply remember the hash it initially created, it must re-hash every URL each time it votes.
 +
* The poller tallies the votes by:
 +
** For each URL in the poller's instance of the AU:
 +
*** For each voter:
 +
**** Computing the hash of ''Np'', ''Nv'' and the content of the URL in the poller's instance of the AU.
 +
**** Comparing the result with the hash value for that URL in that voter's vote.
 +
** Note that the nonces ensure that the poller must re-hash every URL in the AU; it cannot simply remember the hash it initially created.
 
* In tallying the votes, the poller may detect that:
 
* In tallying the votes, the poller may detect that:
 
** A URL it has does not match the consensus of the voters, or
 
** A URL it has does not match the consensus of the voters, or
Line 17: Line 25:
 
** requesting a new copy from one of the voters that agreed with the consensus,
 
** requesting a new copy from one of the voters that agreed with the consensus,
 
** then verifying that the new copy does agree with the consensus.
 
** then verifying that the new copy does agree with the consensus.
 +
 +
In this way, at unpredictable but fairly regular intervals, every poll on an AU checks the union of the set of URLs in that AU on the box calling the poll (poller) and the boxes voting (voters). The check establishes that the URL on the poller agrees with the consensus of the boxes voting in the poll (voters) as to that URL's content. If it does not, it is repaired from one of the boxes in the consensus. Under our current Mellon grant we are investigating the potential benefits of an enhancement to the mechanism that results in every poll on an AU checking that every URL in that AU on each voter agrees with the same URL on the poller.
 +
 +
== Configuration of CLOCKSS Network ==
 +
 +
As described in [[CLOCKSS: Box Operations]] the CLOCKSS boxes are configured to form a Private LOCKSS Network (PLN) including the following configuration options:
 +
* Because the CLOCKSS PLN is a closed network secured by SSL certificate checks at both ends of all connections, the defenses against sybil attacks, which involve the adversary creating new peer identities, are not necessary and are not implemented.
 +
* The efficiency enhancements described below are being gradually and cautiously deployed to the CLOCKSS PLN.
 +
 +
Currently, on average, a poll is called on each AU instance approximately once every 100 days. Since there are currently 12 boxes in the CLOCKSS network, approximately every 8 days on average one instance of a given AU is checked.
  
 
== Enhancements ==
 
== Enhancements ==
Line 24: Line 42:
 
The Mellon-funded work included development of improved instrumentation and analysis software, which polls the administrative Web UI of each LOCKSS box in a network to collect vast amounts of data about the operations of each box. These tools were used on the CLOCKSS network for an initial 59-day period, collecting over 18M data items. The data collected has yet to be fully analyzed but initial analysis shows that the polling process among CLOCKSS boxes continues to operate satisfactorily. Some examples of the graphs generated follow.
 
The Mellon-funded work included development of improved instrumentation and analysis software, which polls the administrative Web UI of each LOCKSS box in a network to collect vast amounts of data about the operations of each box. These tools were used on the CLOCKSS network for an initial 59-day period, collecting over 18M data items. The data collected has yet to be fully analyzed but initial analysis shows that the polling process among CLOCKSS boxes continues to operate satisfactorily. Some examples of the graphs generated follow.
  
[[File:Sample Graph 1.png|200px|thumb|center]] This graph shows the number of AU instances in CLOCKSS boxes which have reached agreement with N other CLOCKSS boxes, showing the progress AUs make after ingest as the LOCKSS: Polling and Repair Protocol identifies matching AU instances at other boxes. It will be seen that there are few AU instances in the sample with few boxes with whom they have reached agreement, and that the majority of AU instances have reached agreement with AU instances at the majority of other CLOCKSS boxes.
+
[[File:hist_pr_auid_count27.png|200px|thumb|center]] This graph shows the number of AU instances in CLOCKSS boxes which have reached agreement with N other CLOCKSS boxes, showing the progress AUs make after ingest as the LOCKSS: Polling and Repair Protocol identifies matching AU instances at other boxes. It will be seen that there are few AU instances in the sample with few boxes with whom they have reached agreement, and that the majority of AU instances have reached agreement with AU instances at the majority of other CLOCKSS boxes.
  
[[File:Sample Graph 2.png|200px|thumb|center]] This graph shows the extent of agreement among the over 40,000 completed polls in the sample. As can be seen, the overwhelming majority of the polls showed complete agreement. Polls with less than complete agreement are likely to have been caused by polling among AU instances that were still collecting content, so had different sub-sets of the URLs in an AU.
+
[[File:Sample Graph 2.png|200px|thumb|center]] This graph shows the extent of agreement among the over 40,000 successfully completed polls in the sample. As can be seen, the overwhelming majority of the polls showed complete agreement. Polls with less than complete agreement are likely to have been caused by polling among AU instances that were still collecting content, so had different sub-sets of the URLs in an AU.
  
 
== Change Process ==
 
== Change Process ==

Revision as of 03:30, 10 April 2014

Contents

LOCKSS: Polling and Repair Protocol

Overview

LOCKSS boxes run the LOCKSS polling and repair protocol as described in our ACM Transactions on Computer Systems paper. The paper describes the polling mechanism as applying to a single file; the LOCKSS daemon applies it to an entire Archival Unit (AU) of content. Each LOCKSS daemon chooses at random the next AU upon which it will use the LOCKSS polling and repair protocol to perform integrity checks. It acts as the poller to call a poll on that AU by:

  • Selecting a random sample of the other CLOCKSS boxes (the voters).
  • Inviting the voters to participate in a poll on the AU, and sending each of them a freshly-generated random nonce Np.
  • The poll involves the voters voting by:
    • Generating a fresh random nonce Nv.
    • Creating a vote containing, for every URL in the voter's instance of the AU:
      • The URL
      • The hash of the concatenation of Np, Nv and the content of the URL.
    • Sending the vote to the poller. Note that the vote contains a hash for each URL in the voter's instance of the AU, but that hash is not the hash of the content. The nonces ensure that the hash in the vote is different for every vote in every poll. The voter cannot simply remember the hash it initially created, it must re-hash every URL each time it votes.
  • The poller tallies the votes by:
    • For each URL in the poller's instance of the AU:
      • For each voter:
        • Computing the hash of Np, Nv and the content of the URL in the poller's instance of the AU.
        • Comparing the result with the hash value for that URL in that voter's vote.
    • Note that the nonces ensure that the poller must re-hash every URL in the AU; it cannot simply remember the hash it initially created.
  • In tallying the votes, the poller may detect that:
    • A URL it has does not match the consensus of the voters, or
    • A URL that the consensus of the voters says should be present in the AU is missing from the poller's AU, or
    • A URL it has does not match the checksum generated when it was stored.
  • If so, it repairs the problem by:
    • requesting a new copy from one of the voters that agreed with the consensus,
    • then verifying that the new copy does agree with the consensus.

In this way, at unpredictable but fairly regular intervals, every poll on an AU checks the union of the set of URLs in that AU on the box calling the poll (poller) and the boxes voting (voters). The check establishes that the URL on the poller agrees with the consensus of the boxes voting in the poll (voters) as to that URL's content. If it does not, it is repaired from one of the boxes in the consensus. Under our current Mellon grant we are investigating the potential benefits of an enhancement to the mechanism that results in every poll on an AU checking that every URL in that AU on each voter agrees with the same URL on the poller.

Configuration of CLOCKSS Network

As described in CLOCKSS: Box Operations the CLOCKSS boxes are configured to form a Private LOCKSS Network (PLN) including the following configuration options:

  • Because the CLOCKSS PLN is a closed network secured by SSL certificate checks at both ends of all connections, the defenses against sybil attacks, which involve the adversary creating new peer identities, are not necessary and are not implemented.
  • The efficiency enhancements described below are being gradually and cautiously deployed to the CLOCKSS PLN.

Currently, on average, a poll is called on each AU instance approximately once every 100 days. Since there are currently 12 boxes in the CLOCKSS network, approximately every 8 days on average one instance of a given AU is checked.

Enhancements

The LOCKSS team's internal monitoring and evaluation processes identified some areas in which the efficiency of the polling process could be improved in the context of the Global LOCKSS Network (GLN). The Andrew W. Mellon Foundation funded work to implement and evaluate improvements in these areas. This is expected to be complete by March 2014. Although these improvements will be deployed to the CLOCKSS network, because there are many fewer boxes in the CLOCKSS network than the GLN the areas of inefficiency are less relevant to the CLOCKSS network. Thus the improvements are not expected to make a substantial difference to the performance of the CLOCKSS network.

The Mellon-funded work included development of improved instrumentation and analysis software, which polls the administrative Web UI of each LOCKSS box in a network to collect vast amounts of data about the operations of each box. These tools were used on the CLOCKSS network for an initial 59-day period, collecting over 18M data items. The data collected has yet to be fully analyzed but initial analysis shows that the polling process among CLOCKSS boxes continues to operate satisfactorily. Some examples of the graphs generated follow.

Hist pr auid count27.png
This graph shows the number of AU instances in CLOCKSS boxes which have reached agreement with N other CLOCKSS boxes, showing the progress AUs make after ingest as the LOCKSS: Polling and Repair Protocol identifies matching AU instances at other boxes. It will be seen that there are few AU instances in the sample with few boxes with whom they have reached agreement, and that the majority of AU instances have reached agreement with AU instances at the majority of other CLOCKSS boxes.
Sample Graph 2.png
This graph shows the extent of agreement among the over 40,000 successfully completed polls in the sample. As can be seen, the overwhelming majority of the polls showed complete agreement. Polls with less than complete agreement are likely to have been caused by polling among AU instances that were still collecting content, so had different sub-sets of the URLs in an AU.

Change Process

Changes to this document require:

  • Review by LOCKSS Engineering Staff
  • Approval by LOCKSS Chief Scientist

Relevant Documents

  1. CLOCKSS: Box Operations
  2. Petros Maniatis, Mema Roussopoulos, TJ Giuli, David S.H. Rosenthal, Mary Baker, and Yanto Muliadi. “LOCKSS: A Peer-to-Peer Digital Preservation System”, ACM Transactions on Computer Systems vol. 23, no. 1, February 2005, pp. 2-50. http://dx.doi.org/10.1145/1047915.1047917 accessed 2013.8.7