When all the communications paths on a server are DEAD, LifeKeeper for Windows assumes that the paired system is DEAD (or down) and attempts to fail over. However, LifeKeeper for Windows performs a safety check to ensure that the failure occurred in the server rather than just the comm paths.
The safety check queries on the network (through LAN Manager) to see if the machine still exists. One of two events can occur:
- System is alive. If the check receives a response that the system does exist on the network, it pauses comm down and waits for at least one comm path to come back up.
SAFETY CHECK DETECTED MACHINE "Target System Name":PAUSING COMM_DOWN
The remote node (Target System Name) appears to be reachable. Waiting for confirmation.
If at least one LK comm path becomes alive again, the following two messages will log:
SAFETY CHECK ABORTED:ABORTING COMM_DOWN
The remote node (target system name) is up.
Failover will NOT occur, and the comm_down event would exit.
- System is dead. If all LK comm paths remain dead, eventually the safety check will time out, and log the following:
SAFETY CHECK for "Target System Name" timed out: assuming "Target System Name" is down
Failover will occur in this case.
LifeKeeper for Windows performs this check only once, after all comm paths go down. If the safety check detects that the system is alive, failover is aborted. LifeKeeper for Windows does not re-initiate failover until all of the following events happen in sequence:
- At least one of the comm paths comes back ALIVE.
- All comm paths again go DEAD.
- The safety check activates and does not detect that the paired system is alive.
Post your comment on this topic.