Watchdog is a method of monitoring a server to ensure that if the server is not working properly, corrective action (reboot) will be taken so that it does not cause problems. Watchdog can be implemented using special watchdog hardware or using a software-only option.
(Note: This configuration has only been tested with Red Hat Enterprise Linux Versions 5 and 6. No other operating systems have been tested; therefore, no others are supported at this time.)
Watchdog timer software driver or an external hardware component
Watchdog daemon – rpm available through the Linux distribution
LifeKeeper core daemon – installed with the LifeKeeper installation
Health check script – LifeKeeper monitoring script
LifeKeeper Interoperability with Watchdog
Read the next section carefully. The daemon is designed to recover from errors and will reset the system if not configured carefully. Planning and care should be given to how this is installed and configured. This section is not intended to explain and configure watchdog, but only to explain and configure how LifeKeeper interoperates in such a configuration.
The following steps should be carried out by an administrator with root user privileges. The administrator should already be familiar with some of the risks and issues with watchdog.
The health check script (LifeKeeper monitoring script) is the component that ties the LifeKeeper configuration with the watchdog configuration (/opt/LifeKeeper/samples/watchdog/LifeKeeper-watchdog). This script provides full monitoring of LifeKeeper and should not require any modifications.
/etc/rc.d/init.d/watchdog stop
Confirmation should be received that watchdog has stopped
Stopping watchdog: [OK]
Modify test-binary:
test-binary = /opt/LifeKeeper/samples/watchdog/LifeKeeper-watchdog
Modify test-timeout:
test-timeout = 5
Modify interval:
interval = 7
The interval value should be less than LifeKeeper communication path timeout (15 seconds), so a good number for the interval is generally half of this value.
Make sure LifeKeeper has been started. If not, please refer to the Starting LifeKeeper topic.
Start watchdog by entering the following command:
/etc/rc.d/init.d/watchdog start
Confirmation should be received that watchdog has started
Starting watchdog: [OK]
To start watchdog automatically on future restarts, enter the following command:
chkconfig --levels 35 watchdog on
Note: Configuring watchdog may cause some unexpected reboots from time to time. This is the general nature of how watchdog works. If processes are not responding correctly, the watchdog feature will assume that LifeKeeper (or the operating system) is hung, and it will reboot the system (without warning).
Care should be taken when uninstalling LifeKeeper. The above steps should be done in reverse order as listed below.
WARNING: IF UNINSTALLING LIFEKEEPER BY REMOVING THE RPM PACKAGES THAT MAKE UP LIFEKEEPER, TURN OFF WATCHDOG FIRST! In Step 2 above, the watchdog config file was modified to call on the LifeKeeper-watchdog script; therefore, if watchdog is not turned off first, it will call on that script that is no longer there. An error will occur when this script is not found which will trigger a reboot. This will continue until watchdog is turned off.
/etc/rc.d/init.d/watchdog stop
Confirmation should be received that watchdog has stopped
Stopping watchdog: [OK]
#test-binary =
#interval =
(Note: If interval was used previously for other functions, it can be left as-is)
chkconfig --levels 35 watchdog off
© 2017 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.