When all the communications paths on a server are DEAD, SIOS Protection Suite assumes that the paired system is DEAD (or down) and attempts to fail over. However, SIOS Protection Suite performs a safety check to ensure that the failure occurred in the server rather than just the comm paths.
The safety check queries on the network (through LAN Manager) to see if the machine still exists. One of two events can occur:
- System is alive. If the check receives a response that the system does exist on the network, it aborts the failover and reports the following message to the LifeKeeper event log:
SAFETY CHECK FAILED: COMM_DOWN ABORTED
- System is dead. If the check does not receive a response within a specified time-out period (default 8 seconds), the machine is assumed to be down and the failover proceeds.
SIOS Protection Suite performs this check only once, after all comm paths go down. If the safety check detects that the system is alive, failover is aborted. SIOS Protection Suite does not re-initiate failover until all of the following events happen in sequence:
- At least one of the comm paths comes back ALIVE.
- All comm paths again go DEAD.
- The safety check activates and does not detect that the paired system is alive.
Post your comment on this topic.