If you find the communication paths failing then coming back up repeatedly (the LifeKeeper GUI showing them as Alive, then Dead, then Alive), the heartbeat tunables may not be set to the same values on all servers in the cluster.
This situation is also possible if the tunable name is misspelled in the LifeKeeper defaults file /etc/default/LifeKeeper on one of the servers.
Suggested Action
- Shut down LifeKeeper on all servers in the cluster.
- On each server in the cluster, check the values and spelling of the LCMHBEATTIME and LCMNUMHBEATS tunables in /etc/default/LifeKeeper. Ensure that for each tunable, the values are the same on ALL servers in the cluster.
- Restart LifeKeeper on all servers.
Post your comment on this topic.