SYMPTOM: Occasionally, LifeKeeper for Windows is installed on a system that is not performing well. There are tell-tale signs of abnormal system behavior that LifeKeeper for Windows can detect such as the health check processes not starting or ending properly. The most common problem causing check process timeouts is incorrect system memory optimization for databases or mail servers. There must be enough memory to start health check processes and initiate failovers at all times.
SOLUTION: The LifeKeeper for Windows Release Notes include guidelines that identify system memory requirements for LifeKeeper for Windows. It also briefly explains how to use the Windows Performance Monitors to verify that enough memory is available for applications such as LifeKeeper for Windows in your system. To help identify this situation, there are two Resource Monitoring options available to record abnormal behavior and three options to take corrective action when check process timeouts are occurring. All five options can be enabled or disabled as LifeKeeper for Windows registry settings to meet specific customer requirements and preferences:
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\STEELEYE\LifeKeeper\General\ResMon_RecordTimeout
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\STEELEYE\LifeKeeper\General\ResMon_RecordMemory
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\STEELEYE\LifeKeeper\General\ResMon_ResFail
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\STEELEYE\LifeKeeper\General\ResMon_RebootWaitInSec
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\STEELEYE\LifeKeeper\General\ResMon_ResFailMaxWaitInMin
The first 4 options (RecordTimeout, RecordMemory, ResFail, and RebootWaitInSec) are triggered by any LifeKeeper for Windows Quick Check or Deep Check process monitoring a protected resource that does not complete in the expected amount of time.
ResMon_RecordTimeout – This option records any Quick Check or Deep Check process that times out. This information is logged in <LifeKeeper Root Folder>\Out\ResMonTimeout.log file. This option is enabled (=1) by default. To disable, set the value to 0.
ResMon_RecordMemory – This option records system memory usage and process memory usage for every active process whenever a Quick Check or Deep Check process times out. Memory usage is logged in <LifeKeeper Root Folder>\Out\ResMonTimeout.log file. This option is enabled (=1) by default. To disable, set the value to 0.
ResMon_ResFail – This option causes a resource hierarchy failover whenever a Quick Check or Deep Check process times out. All dependent resources in the affected hierarchy are failed over. This option is disabled (=0) by default. To enable, set the value to 1.
ResMon_RebootWaitInSec – This option causes a system reboot whenever a Quick Check or Deep Check process times out. This option is disabled (=0) by default. To enable, enter any non-zero number in the registry setting. This number will become the countdown displayed on the system console when the system is rebooted by this feature. The reboot sequence is totally automatic and designed for unattended system operation. Once the countdown is started, the reboot cannot be stopped.
ResMon_ResFailMaxWaitInMin – This option will monitor the LifeKeeper for Windows failover process whenever any failover is occurring. The value of this registry setting is the number of minutes LifeKeeper for Windows will wait for a resource hierarchy failover to complete. If the failover process cannot be started or if the failover does not complete in the specified number of minutes, LifeKeeper for Windows will attempt to reboot the system. If the resources were not failed over, they will come in service again on the same server that was rebooted. This option is disabled (=0) by default.
The Resource Monitoring Options apply to all protected resources and they can be changed at any time. Changed settings take affect the next time a Quick Check or Deep Check process is started.
Post your comment on this topic.