Watchdog is a method of monitoring a server to ensure that if the server is not working properly, corrective action (reboot) will be taken so that it does not cause problems. Watchdog can be implemented using special watchdog hardware or using a software-only option.
Components
- Watchdog timer – software driver or an external hardware component
- Watchdog daemon – rpm available through the Linux distribution
- LifeKeeper core daemon – installed with the LifeKeeper installation
- Health check script – Script to check the status of LifeKeeper core
Read the next section carefully. The daemon is designed to recover from errors and will reset the system if not configured carefully. Planning and care should be given to how this is installed and configured. This section is not intended to explain and configure watchdog, but only to explain and configure how LifeKeeper interoperates in such a configuration.
Configuration
The following steps should be carried out by an administrator with root user privileges. The administrator should already be familiar with some of the risks and issues with watchdog.
The health check script (LifeKeeper monitoring script) is the component that ties the LifeKeeper configuration with the watchdog configuration (/opt/LifeKeeper/samples/watchdog/LifeKeeper-watchdog). This script can monitor the basic parts of LifeKeeper core components.
- If watchdog has been previously configured, enter the following command to stop it. If not, go to Step 2.
systemctl stop watchdog
Edit the watchdog configuration file (/etc/watchdog.conf) supplied during the installation of watchdog software.
- Modify test-binary:
test-binary = /opt/LifeKeeper/samples/watchdog/LifeKeeper-watchdog
- Modify test-timeout:
test-timeout = 5
- Modify interval:
interval = 7
The interval value should be less than LifeKeeper communication path timeout (15 seconds), so a good number for the interval is generally half of this value.
- Make sure LifeKeeper has been started. If not, please refer to the Starting LifeKeeper topic.
- Start watchdog by entering the following command:
systemctl start watchdog
- To start watchdog automatically on future restarts, enter the following command:
systemctl enable watchdog
Uninstall
Care should be taken when uninstalling LifeKeeper. The above steps should be done in reverse order as listed below.
- Stop watchdog by entering the following command:
systemctl stop watchdog
- Edit the watchdog configuration file (/etc/watchdog.conf) supplied during the installation of watchdog software.
- Modify test-binary and interval by commenting out those entries (add # at the beginning of each line):
#test-binary =
#interval =
Note: If interval was used previously for other functions, it can be left as-is
- Uninstall LifeKeeper. See the Removing LifeKeeper topic.
- Watchdog can now be started again. If only used by LifeKeeper, watchdog can be permanently disabled by entering the following command:
systemctl disable watchdog
Post your comment on this topic.