Watchdog is a method of monitoring a server to ensure that if the server is not working properly, corrective action (reboot) will be taken so that it does not cause problems. Watchdog can be implemented using special watchdog hardware or using a software-only option.
(Note: This configuration has only been tested with Red Hat Enterprise Linux Versions 6 and 7. No other operating systems have been tested; therefore, no others are supported at this time.)
Watchdog timer - software driver or an external hardware component
Watchdog daemon – rpm available through the Linux distribution
LifeKeeper core daemon – installed with the LifeKeeper installation
Health check script – Script to check the status of LifeKeeper core
LifeKeeper Interoperability with Watchdog
Read the next section carefully. The daemon is designed to recover from errors and will reset the system if not configured carefully. Planning and care should be given to how this is installed and configured. This section is not intended to explain and configure watchdog, but only to explain and configure how LifeKeeper interoperates in such a configuration.
The following steps should be carried out by an administrator with root user privileges. The administrator should already be familiar with some of the risks and issues with watchdog.
The health check script (LifeKeeper monitoring script) is the component that ties the LifeKeeper configuration with the watchdog configuration (/opt/LifeKeeper/samples/watchdog/LifeKeeper-watchdog). This script can monitor the basic parts of LifeKeeper core components.
service watchdog stop (RHEL6)
systemctl stop watchdog (RHEL7)
Modify test-binary:
test-binary = /opt/LifeKeeper/samples/watchdog/LifeKeeper-watchdog
Modify test-timeout:
test-timeout = 5
Modify interval:
interval = 7
The interval value should be less than LifeKeeper communication path timeout (15 seconds), so a good number for the interval is generally half of this value.
Make sure LifeKeeper has been started. If not, please refer to the Starting LifeKeeper topic.
Start watchdog by entering the following command:
service watchdog start (RHEL6)
systemctl start watchdog (RHEL7)
To start watchdog automatically on future restarts, enter the following command:
chkconfig --levels 35 watchdog on (RHEL6)
systemctl enable watchdog (RHEL7)
Note: Configuring watchdog may cause some unexpected reboots from time to time. This is the general nature of how watchdog works. If processes are not responding correctly, the watchdog feature will assume that LifeKeeper (or the operating system) is hung, and it will reboot the system (without warning).
Care should be taken when uninstalling LifeKeeper. The above steps should be done in reverse order as listed below.
WARNING: IF UNINSTALLING LIFEKEEPER BY REMOVING THE RPM PACKAGES THAT MAKE UP LIFEKEEPER, TURN OFF WATCHDOG FIRST! In Step 2 above, the watchdog config file was modified to call on the LifeKeeper-watchdog script; therefore, if watchdog is not turned off first, it will call on that script that is no longer there. An error will occur when this script is not found which will trigger a reboot. This will continue until watchdog is turned off.
service watchdog stop (RHEL6)
systemctl stop watchdog (RHEL7)
#test-binary =
#interval =
(Note: If interval was used previously for other functions, it can be left as-is)
chkconfig --levels 35 watchdog off (RHEL6)
systemctl disable watchdog (RHEL7)
© 2018 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.