The File System Health Monitoring feature detects conditions that could cause LifeKeeper protected applications that depend on the file system to fail. Monitoring occurs on active/in-service resources (i.e. file systems) only. The two conditions that are monitored are:
- A full (or almost full) file system, and
- An improperly mounted (or unmounted) file system.
When either of these two conditions is detected, one of several actions might be taken.
- A warning message can be logged and email sent to a system administrator.
- Local recovery of the resource can be attempted.
- The resource can be failed over to a backup server.
Full or Almost Full File System
A “disk full” condition can be detected, but cannot be resolved by performing a local recovery or failover – administrator intervention is required. A message will be logged by default. Additional notification functionality is available. For example, an email can be sent to a system administrator, or another application can be invoked to send a warning message by some other means.To enable notification for the full/almost full disk conditions a basic event notification script name notify has been provided in the directory /opt/LifeKeeper/events/filesys/diskfull. Simply add the functionality required to send email or execute another application.
In addition to a “disk full” condition, a “disk almost full” condition can be detected and a warning message logged in the LifeKeeper log.
The “disk full” threshold is:
The “disk almost full” threshold is:
The default values are 90% and 95% as shown, but are configurable via tunables in the /etc/default/LifeKeeper file. The meanings of these two thresholds are as follows:
FILESYSFULLWARN – When a file system reaches this percentage full, a message will be displayed in the LifeKeeper log.
FILESYSFULLERROR – When a file system reaches this percentage full, a message will be displayed in the LifeKeeper log as well as the system log. The file system notify script will also be called.
Unmounted or Improperly Mounted File System
LifeKeeper checks the /etc/mtab file to determine whether a LifeKeeper protected file system that is in service is actually mounted. In addition, the mount options are checked against the stored mount options in the filesys resource information field to ensure that they match the original mount options used at the time the hierarchy was created.
If an unmounted or improperly mounted file system is detected, local recovery is invoked and will attempt to remount the file system with the correct mount options.
If the remount fails, failover will be attempted to resolve the condition. The following is a list of common causes for remount failure which would lead to a failover:
- corrupted file system (fsck failure)
- failure to create mount point directory
- mount point is busy
- mount failure
- LifeKeeper internal error
- a new mount option may have been added after an OS upgrade
Increase the amount of time for a umount or to decrease the amount of time before a force umount is done
Certain tunable values will increase the amount of time for a umount or to decrease the amount of time before a force umount is done. There are 3 tunables that contribute to this timer:
FS_UMOUNT_RETRIES (defaults to 1)
FS_KERNEL_RETRIES (defaults to 60). (This is the recommended value to change to modify timing)
FS_UMOUNT_TIMEOUT (defaults to 30)
The maximum execution time of the forceumount script can be lowered through manipulation of the FS_KERNEL_RETRIES tunable. Lowering the value by one will save three seconds of (maximum) script execution time. The absolute minimum execution time is 30 seconds****, and maximum execution time is defined as
Maximum execution time = ($FS_KERNEL_RETRIES * 3) + 3*($FS_UMOUNT_RETRIES + 3)
New or Deprecated Mount Options After a Kernel Upgrade
When upgrading the Linux kernel, it is possible that some existing file system mount options may be deprecated in the new kernel or that the new kernel may add new default mount options to existing mounts. For example, the “nobarrier” mount option was deprecated in RedHat Enterprise Linux 8, and some kernel versions have added new default mount options such as “logbufs=8” and “logbsize=32k”.
If a LifeKeeper-protected file system resource contains mount options which become deprecated after a kernel upgrade, the deprecated options should be removed from the list of mount options for the LifeKeeper resource on every server in the cluster. See the Modifying Mount Options for a LifeKeeper File System Resource section for more details.
If new default mount options are added by the kernel to an existing LifeKeeper-protected mount point after a kernel upgrade, then the new options should be added to the list of mount options for the LifeKeeper resource on every server in the cluster. See the Modifying Mount Options for a LifeKeeper File System Resource section for more details.
Modifying Mount Options for a LifeKeeper File System Resource
To modify the mount options used by LifeKeeper when mounting a protected file system:
- Take the file system resource out of service in LifeKeeper. This will unmount the protected file system.
- Update the mount options on each server in the cluster.
a. Using the LifeKeeper GUI
i. Right-click on the file system resource on each server where you would like to change the mount options and select “Change Mount Options”.
ii.In the resulting dialog, modify the mount options by providing a comma-separated list of options to be used when LifeKeeper mounts the file system. Once the desired mount options have been entered, click “Set Value”. Click “Finish” to exit the confirmation dialog.
b. Using the LifeKeeper Command Line Interface
i. lkcli resource config fs --tag <tag> --mountopts “comma-separated list”
ii.This command must be run on each server in the cluster.
- Bring the file system resource in-service in LifeKeeper. This will mount the protected file system.