You are here: Configuration > LifeKeeper IO Fencing > Watchdog

Watchdog

Watchdog is a method of monitoring a server to ensure that if the server is not working properly, corrective action (reboot) will be taken so that it does not cause problems. Watchdog can be implemented using special watchdog hardware or using a software-only option.

(Note: This configuration has only been tested with Red Hat Enterprise Linux Versions 5 and 6. No other operating systems have been tested; therefore, no others are supported at this time.)

Components


  watchdog.jpg 

LifeKeeper Interoperability with Watchdog

 

Read the next section carefully. The daemon is designed to recover from errors and will reset the system if not configured carefully. Planning and care should be given to how this is installed and configured. This section is not intended to explain and configure watchdog, but only to explain and configure how LifeKeeper interoperates in such a configuration.

Configuration

The following steps should be carried out by an administrator with root user privileges. The administrator should already be familiar with some of the risks and issues with watchdog.

The health check script (LifeKeeper monitoring script) is the component that ties the LifeKeeper configuration with the watchdog configuration (/opt/LifeKeeper/samples/watchdog/LifeKeeper-watchdog). This script provides full monitoring of LifeKeeper and should not require any modifications.

  1. If watchdog has been previously configured, enter the following command to stop it. If not, go to Step 2.

/etc/rc.d/init.d/watchdog stop

Confirmation should be received that watchdog has stopped

Stopping watchdog: [OK]

  1. Edit the watchdog configuration file (/etc/watchdog.conf) supplied during the installation of watchdog software.

test-binary = /opt/LifeKeeper/samples/watchdog/LifeKeeper-watchdog

test-timeout = 5

interval = 7

The interval value should be less than LifeKeeper communication path timeout (15 seconds), so a good number for the interval is generally half of this value.

  1. Make sure LifeKeeper has been started. If not, please refer to the Starting LifeKeeper topic.

  2. Start watchdog by entering the following command:

/etc/rc.d/init.d/watchdog start

Confirmation should be received that watchdog has started

Starting watchdog: [OK]

  1. To start watchdog automatically on future restarts, enter the following command:

chkconfig --levels 35 watchdog on

Note: Configuring watchdog may cause some unexpected reboots from time to time. This is the general nature of how watchdog works. If processes are not responding correctly, the watchdog feature will assume that LifeKeeper (or the operating system) is hung, and it will reboot the system (without warning).

Uninstall

Care should be taken when uninstalling LifeKeeper. The above steps should be done in reverse order as listed below.

WARNING: IF UNINSTALLING LIFEKEEPER BY REMOVING THE RPM PACKAGES THAT MAKE UP LIFEKEEPER, TURN OFF WATCHDOG FIRST! In Step 2 above, the watchdog config file was modified to call on the LifeKeeper-watchdog script; therefore, if watchdog is not turned off first, it will call on that script that is no longer there. An error will occur when this script is not found which will trigger a reboot. This will continue until watchdog is turned off.

  1. Stop watchdog by entering the following command:

/etc/rc.d/init.d/watchdog stop

Confirmation should be received that watchdog has stopped

Stopping watchdog: [OK]

  1. Edit the watchdog configuration file (/etc/watchdog.conf) supplied during the installation of watchdog software.

#test-binary =
#interval =

(Note: If interval was used previously for other functions, it can be left as-is)

  1. Uninstall LifeKeeper. See the Removing LifeKeeper topic.
  2. Watchdog can now be started again. If only used by LifeKeeper, watchdog can be permanently disabled by entering the following command:

chkconfig --levels 35 watchdog off

© 2012 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.