While reservations provide the highest level of data protection for shared storage, in some cases, the use of reservations is not available and must be disabled within LifeKeeper. With reservations disabled, the storage no longer acts as an arbitrator in cases where multiple systems attempt to access the storage, intentionally or unintentionally.
Consideration should be given to the use of other methods to fence the storage through cluster membership which is needed to handle system hangs, system busy situations and any situation where a server can appear to not be alive.
The key to a reliable configuration without reservations is to “know” that when a failover occurs, the “other” server has been powered off or power cycled. There are four fencing options that help accomplish this, allowing LifeKeeper to provide a very reliable configuration, even without SCSI reservations. These include the following:
- STONITH (Shoot the Other Node in the Head) using a highly reliable interconnect, i.e. serial connection between server and STONITH device. STONITH is the technique to physically disable or power-off a server when it is no longer considered part of the cluster. LifeKeeper supports the ability to power off servers during a failover event thereby insuring safe access to the shared data. This option provides reliability similar to reservations but is limited to two nodes physically located together.
- Quorum/Witness – Quorum/witness servers are used to confirm membership in the cluster, especially when the cluster servers are at different locations. While this option can handle split-brain, it, alone, is not recommended due to the fact that it does not handle system hangs.
- Watchdog – Watchdog monitors the health of a server. If a problem is detected, the server with the problem is rebooted or powered down. This option can recover from a server hang; however, it does not handle split-brain; therefore this option alone is also not recommended.
- CONFIRM_SO – This option requires that automatic failover be turned off, so while very reliable (depending upon the knowledge of the administrator), it is not as available.
While none of these alternative fencing methods alone are likely to be adequate, when used in combination, a very reliable configuration can be obtained.
Non-Shared Storage
If planning to use LifeKeeper in a non-shared storage environment, the risk of data corruption that exists with shared storage is not an issue; therefore, reservations are not necessary. However, partial or full resyncs and merging of data may be required. To optimize reliability and availability, the above options should be considered with non-shared storage as well.
It is important to note that no option will provide complete data protection, but the following combination will provide almost the same level of protection as reservations.
Configuring I/O Fencing Without Reservations
To configure a cluster to support node fencing, complete the following steps:
- Stop LifeKeeper.
- Disable the use of SCSI reservations within LifeKeeper. This is accomplished by editing the LifeKeeper defaults file, /etc/default/LifeKeeper, on all nodes in the cluster. Add or modify the Reservations variable to be “none”, e.g. RESERVATIONS=none. (Note: This option should only be used when reservations are not available.)
- Obtain and configure a STONITH device or devices to provide I/O fencing. Note that for this configuration, STONITH devices should be configured to do a system “poweroff” command rather than a “reboot”. Take care to avoid bringing a device hierarchy in service on both nodes simultaneously via a manual operation when LifeKeeper communications have been disrupted for some reason.
- If desired, obtain and configure a quorum/witness server(s). For complete instructions and information on configuring and using a witness server, see Quorum/Witness Server Support Package topic.
- If desired, configure watchdog. For more information, see the Watchdog topic.
Accessing Shared Storage
Regardless of whether reservations are enabled or disabled, there are two issues to be aware of:
- Access to the storage must be controlled by LifeKeeper.
- Great care must be taken to ensure that the storage is not accessed unintentionally such as by mounting file systems manually, fsck manually, etc.
STANDBY NODE WRITE PROTECTION Feature
Even when reservations are disabled (RESERVATIONS=none), writes to disk devices from the standby node can be prohibited to prevent data corruption due to inadvertent unintended access. To enable this feature, add the following to /etc/default/LifeKeeper.
STANDBY_NODE_WRITE_PROTECTION=enable SNHC=1 SNHC_DISKCHECK=1
This feature uses Standby Node Health Check. Therefore, it may not become write-protected until the OSU resource monitoring OSUquickCheck is executed. Note that it may be writable for approximately LKCHECKINTERVAL (default: 120) seconds after switchover.
This feature does not prevent split brain. Also, both nodes will be allowed to write if a split brain occurs. When reservations are set to disabled (RESERVATIONS=none), I/O fencing must be configured in a different manner to ensure data reliability in the cluster.
Post your comment on this topic.