LOCKSS: Property Server Operations
LOCKSS: Property Server Operations
The Property Server makes LOCKSS properties available to LOCKSS and CLOCKSS boxes via the HTTP and HTTPS protocols. Although any web server would suffice, the Property Server for LOCKSS and CLOCKSS uses the free and open-source Apache HTTP Server under the Ubuntu Server LTS operating system.
Apache Virtual Host
Currently, the Property Servers for all PLNs, including CLOCKSS, use the same Apache Virtual Host definition. Each PLN, including CLOCKSS, has separate access control definitions for the LOCKSS daemon property files and plugins served by the Property Server. CLOCKSS has three different LOCKSS daemon properties; one for each class of machines it runs:
- CLOCKSS Ingest
- CLOCKSS Production
- CLOCKSS Triggered
A typical access control definition within the Property Server looks like:
<Directory "/home/www/props/html/clockss"> Order deny,allow Deny from all Include /etc/apache2/access.d/props/lockss Include /etc/apache2/access.d/props/clockss </Directory>
The properties for a PLN such as CLOCKSS are only accessible by machines on the LOCKSS subnet 18.104.22.168/24 at Stanford and by authorized boxes in each network, such as CLOCKSS Ingest and Production boxes. A machine's IP address must be explicitly added to the ACL by a designated LOCKSS engineer before the machine can access the appropriate property files.
Property Change Process
Use of version control
LOCKSS property files are static XML files and changes are made through RCS, an early revision control system still maintained by the GNU Project. Changes are made by checking out and locking the property file to be updated, making changes and finally, checking it in (with a brief description of changes) and unlocking. It is possible to revert to an earlier version of a file, if necessary, using RCS.
Property Server Access Control
Only authorized LOCKSS team members are allowed to login to the Property Server. Additionally, a LOCKSS team member's user account must be part of a privileged user group to modify the CLOCKSS property files.
The CLOCKSS property files refer the LOCKSS daemon running on CLOCKSS boxes to a URL where the daemon can retrieve CLOCKSS-specific LOCKSS daemon plugins as signed .jar files, generated as described in LOCKSS: Software Development Process. These plugins are also served by the CLOCKSS Property Server and are served in a way that allows them to be preserved by the LOCKSS daemon. The Apache definition is as follows:
<Directory "/home/www/props/html/clockss/plugins"> Options Indexes MultiViews IndexOptions IgnoreCase SuppressHTMLPreamble IndexIgnore .. held FOOTER* HeaderName HEADER.html ReadmeName FOOTER.html </Directory>
Cloud Mirror and Fail-over
Amazon EC2 Instance
The LOCKSS Property Server and HTTP server are mirrored to an Amazon Elastic Computing Cloud (EC2) instance nightly (and any changes made to the mirror are lost on the next sync cycle). If the CLOCKSS Property Server or HTTP server needs downtime for maintenance or experiences a problem (due to hardware failure or network or power outage) the mirror can take its place and continue to provide core services.
Our EC2 instance is currently running in Amazon's US-EAST-1 region located in northern Virginia and should be insulated from a catastrophic event on the West Coast. Amazon Web Service (AWS) makes it easy to replicate the mirror to other regions within the United States or internationally (Ireland, Singapore, Sydney, Tokyo, Sao Paulo) if necessary.
Although changes could be made to the mirror, it is treated by LOCKSS and CLOCKSS processes as read-only; changes made to the mirror are lost on the next nightly mirror sync.
Access control for the EC2 instance
The CLOCKSS Amazon EC2 instances are managed through one Amazon AWS account. The credentials are only known among the CLOCKSS systems administrators at Stanford. It's possible to fine tune access controls using AWS Identity and Access Management (IAM) to AWS resources, however, this is not necessary for our use case.
The EC2 instance access control is done through EC2 Security Groups and Key Pairs. The former is used to configure a class of external firewall rules that can be applied to any EC2 instance under the AWS account. The latter establishes SSH public-private keys pairs. The key pair can be generated by Amazon and downloaded or a public key can be uploaded. Private keys are never stored at Amazon. The Key Pair assigned to an instance is used by the Amazon Machine Image (AMI) as the initial SSH key pair to install for in a newly brought up instance.
Root access is disabled on the official Ubuntu LTS AMIs we use. Instead the initial username is ubuntu.
The nightly mirror process
Each night the following process updates the mirror:
- All files to be mirrored are copied to a staging area using rsync.
- The MySQL server dumps all databases to the staging area.
- The contents of the staging area are copied to a staging area on the Amazon EC2 instance using rsync.
- The Amazon EC2 instance updates itself from the staging area by:
- Copying the files to their proper location and modifying files, if necessary.
- Re-loading its MySQL server from the dumps.
- Re-starting the services.
Fail-over to the Amazon EC2 mirror is triggered during scheduled and unscheduled events causing interruption to core LOCKSS services, as called for by LOCKSS management. These include events such as:
- Scheduled hardware upgrades or replacement
- Scheduled software upgrades
- Hardware failures
- Software failures (kernel panics, segmentation faults, ...)
- Infrastructure interruption (cooling, networking, power, ...)
- Human error
- Natural disaster
Although the process of failing over to the mirror can be performed quickly, it is not time-critical because all information the CLOCKSS boxes obtain from the property server that is part of the fail-over is cached on each box. They continue to operate during a property server outage using their most recent content from the property server. The cache does not persists across daemon restarts, so a box whose daemon restarts during the fail-over process will wait until the fail-over succeeds before restarting.
The fail-over process
The LOCKSS Property Server mirror is synced nightly so the fail-over process only requires updating the relevant DNS records, for eaxmple those for CLOCKSS, to point to the mirror. This is done manually; the designated LOCKSS team member logs in to the CLOCKSS domain name registrar and updates the DNS records. The time-to-live (TTL) for core records is set to 30 minutes so it will take at most 30 minutes before any changes are fully propagated through the Internet. Access during the TTL of an unplanned fail-over may be intermittent for this reason, but will not cause problems for the LOCKSS boxes.
Access control in the fail-over process
Only designated LOCKSS team members have login access to the Amazon EC2 instance; there should be no need for any such login during normal or fail-over operations.
Changes to this document require:
- Review by LOCKSS Network Administrator
- Approval by LOCKSS Technical Manager