This section provides information regarding issues that may be encountered with the use of DRBD Recovery Kit. Where appropriate, additional explanation of the cause of an error is provided along with necessary action to resolve the error condition.
Messages specific to the DRBD Recovery Kit can be found in the DRBD Message Catalog. Messages from other LifeKeeper for Linux components are also possible. In these cases, please refer to the Combined Message Catalog which provides a listing of all error codes, including operational, administrative and GUI, that may be encountered while using LifeKeeper for Linux and, where appropriate, provides additional explanation of the cause of the error code and necessary action to resolve the issue. This full listing may be searched for any error code received, or you may go directly to one of the individual Message Catalogs for the appropriate LifeKeeper component.
The following table lists possible problems and suggestions.
DRBD GUI wizard does not list a newly created partition. | The Linux OS may not recognize a newly created partition until the next reboot of the system. View the /proc/partitions file for an entry of your newly created partition. If your new partition does not appear in the file, you will need to reboot your system. |
Errors during failover | Check the status of your device. If resynchronization is in progress you may not be able to perform a failover. |
DRBD resource is not stopped after “Delete Resource Hierarchy”. | Deleting a LifeKeeper DRBD resource will not stop (take down) the DRBD resource if the DRBD resource is mounted or open. The error message when deleting the DRBD resource is: Unable to stop/remove the resource /dev/drbd<num> on <sys>. Use “drbdsetup down lk<num>” on <sys> to manually stop the resource after the file system is unmounted. To stop the resource run: umount /dev/drbd<num> drbdsetup down lk<num> |
Primary server will not automatically bring the resource ISP after both servers have stopped LifeKeeper or were shutdown. | If the primary server (where the resource was in-service when LifeKeeper was stopped or the server shutdown) becomes operable before the secondary server, LifeKeeper will not automatically bring the DRBD resource in-service until the secondary server has started LifeKeeper. This is to assure that DRBD is using the most up-to-date data. You can force the DRBD resource online by opening the resource properties dialog, clicking the Replication Status tab, clicking the Actions button, and then selecting Force Mirror Online. Click Continue to confirm, then Finish. It is recommended to have the primary and secondary servers operable before bringing the DRBD resource online to assure there is no data loss. |
Resources appear green (ISP) on both primary and backup servers. | This is a “split-brain” scenario that can be caused by a temporary communications failure causing LifeKeeper to bring the resources in-service on both servers. DRBD will not resync the data because it does not know which system has the best or latest data. Manual intervention is required. You must determine which server has the best or latest data, then take the resource out of service on the other server. Replication will NOT automatically resume. To resume replication in the GUI, right click on the DRBD resource and select Resume Replication. |
Core – Language Environment Effects | Some LifeKeeper scripts parse the output of Linux system utilities and rely on certain patterns in order to extract information. When some of these commands run under non-English locales, the expected patterns are altered and LifeKeeper scripts fail to retrieve the needed information. For this reason, the language environment variable LC_MESSAGES has been set to the POSIX “C” locale (LC_MESSAGES=C) in /etc/default/LifeKeeper. It is not necessary to install Linux with the language set to English (any language variant available with your installation media may be chosen); the setting of LC_MESSAGES in /etc/default/LifeKeeper will only influence LifeKeeper. If you change the value of LC_MESSAGES in /etc/default/LifeKeeper, be aware that it may adversely affect the way LifeKeeper operates. |
drbdadm verify <res> detects mismatched blocks | The verify feature of DRBD will do an online block by block comparison to verify that data is the same on all servers. All blocks on the device are compared whether they are currently in-use or not. While block miscompares may indicate corrupted user data, they may also indicate unused blocks that are different whose miscompare can be ignored. There are two valid reasons unused blocks may miscompare: 1. The underlying disks were not zeroed out before create and extend. LifeKeeper does not do a full synchronization when creating a new resource since it is creating a new file system so only new data needs to be synchronized. It is recommended where verification will be used that the underlying disks should be zeroed out before configuring the DRBD resource. 2. There is a kernel optimization when blocks are no longer being used that can lead to miscompares. |
Post your comment on this topic.