While reservations provide the highest level of data protection for shared storage, in some cases, the use of reservations is not available and must be disabled within LifeKeeper. With reservations disabled, the storage no longer acts as an arbitrator in cases where multiple systems attempt to access the storage, intentionally or unintentionally.
Consideration should be given to the use of other methods to fence the storage through cluster membership which is needed to handle system hangs, system busy situations and any situation where a server can appear to not be alive.
The key to a reliable configuration without reservations is to “know” that when a failover occurs, the “other” server has been powered off or power cycled. There are four fencing options that help accomplish this, allowing LifeKeeper to provide a very reliable configuration, even without SCSI reservations. These include the following:
STONITH (Shoot the Other Node in the Head) using a highly reliable interconnect, i.e. serial connection between server and STONITH device. STONITH is the technique to physically disable or power-off a server when it is no longer considered part of the cluster. LifeKeeper supports the ability to power off servers during a failover event thereby insuring safe access to the shared data. This option provides reliability similar to reservations but is limited to two nodes physically located together.
Quorum/Witness – Quorum/witness servers are used to confirm membership in the cluster, especially when the cluster servers are at different locations. While this option can handle split-brain, it, alone, is not recommended due to the fact that it does not handle system hangs.
Watchdog – Watchdog monitors the health of a server. If a problem is detected, the server with the problem is rebooted or powered down. This option can recover from a server hang; however, it does not handle split-brain; therefore this option alone is also not recommended.
While none of these alternative fencing methods alone are likely to be adequate, when used in combination, a very reliable configuration can be obtained.
If planning to use LifeKeeper in a non-shared storage environment, the risk of data corruption that exists with shared storage is not an issue; therefore, reservations are not necessary. However, partial or full resyncs and merging of data may be required. To optimize reliability and availability, the above options should be considered with non-shared storage as well.
Note: For further information comparing the reliability and availability of the different options, see the I/O Fencing Comparison Chart.
It is important to note that no option will provide complete data protection, but the following combination will provide almost the same level of protection as reservations.
To configure a cluster to support node fencing, complete the following steps:
Disable the use of SCSI reservations within LifeKeeper. This is accomplished by editing the LifeKeeper defaults file, /etc/default/LifeKeeper, on all nodes in the cluster. Add or modify the Reservations variable to be “none”, e.g. RESERVATIONS=”none”. (Note that this option should only be used when reservations are not available.)
Obtain and configure a STONITH device or devices to provide I/O fencing. Note that for this configuration, STONITH devices should be configured to do a system “poweroff” command rather than a “reboot”. Take care to avoid bringing a device hierarchy in service on both nodes simultaneously via a manual operation when LifeKeeper communications have been disrupted for some reason.
If desired, obtain and configure a quorum/witness server(s). For complete instructions and information on configuring and using a witness server, see Quorum/Witness Server Support Package topic.
Note: The quorum/witness server should reside at a site apart from the other servers in the cluster to provide the greatest degree of protection in the event of a site failure.
If desired, configure watchdog. For more information, see the Watchdog topic.
© 2016 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.
Open topic with navigation