You are here: Troubleshooting

Troubleshooting

The following table lists possible problems and suggestions.

Symptom

Suggested Action

NetRAID device not deleted after DataKeeper resource deletion.

Deleting a DataKeeper resource will not delete the NetRAID device if the NetRAID device is mounted. You can manually unmount the device and delete  it by executing:

 mdadm –S <md_device> (cat /proc/mdstat to determine the <md_device>).

Installation/HADR rpm fails

See the Installation section for complete instructions on manually installing these files.

Errors during failover

Check the status of your device.  If resynchronization is in progress you cannot perform a failover.

After primary server panics, DataKeeper resource goes ISP on the secondary server, but when primary server reboots, the DataKeeper resource becomes OSF on both servers.

Check the “switchback type” selected when creating your DataKeeper resource hierarchy.  Automatic switchback is not supported for DataKeeper resources in this release.  You can change the Switchback type to “Intelligent” from the resource properties window.

Primary server cannot bring the resource ISP when it reboots after both servers became inoperable.

If the primary server becomes operable before the secondary server, you can force the DataKeeper resource online by opening the resource properties dialog, clicking the Replication Status tab, clicking the Actions button, and then selecting Force Mirror Online. Click Continue to confirm, then Finish.

Error creating a DataKeeper hierarchy on currently mounted NFS file system

You are attempting to create a DataKeeper hierarchy on a file system that is currently exported by NFS.  You will need to replicate this file system before you export it. 

DataKeeper GUI wizard does not list a newly created partition

The Linux OS may not recognize a newly created partition until the next reboot of the system.  View the /proc/partitions file for an entry of your newly created partition.  If your new partition does not appear in the file, you will need to reboot your system.

Resources appear green (ISP) on both primary and backup servers.

This is a “split-brain” scenario that can be caused by a temporary communications failure. After communications are resumed, both systems assume they are primary.

DataKeeper will not resync the data because it does not know which system was the last primary system. Manual intervention is required.

If not using a bitmap:

You must determine which server was the last backup, then take the resource out of service on that server. DataKeeper will then perform a FULL resync.

If using a bitmap (2.6.18 and earlier kernel):

You should take both resources out of service, starting with the original backup node first. You should then dirty the bitmap on the primary node by executing: $LKROOT/lkadm/subsys/scsi/netraid/bin/bitmap –d /opt/LifeKeeper/bitmap_filesys

(where /opt/LifeKeeper/bitmap_filesys is the bitmap filename). This will force a full resync when the resource is brought into service. Next, bring the resource into service on the primary node and a full resync will begin.

If using a bitmap (2.6.19 and later kernel or with RedHat Enterprise Linux 5.4 kernels 2.6.18-164 or later (or a supported derivative of RedHat 5.4 or later):

You must determine which server was the last backup, then take the resource out of service on that server. DataKeeper will then perform a partial resync.

lklin00001458

Installation - Package check errors (rpm -V steeleye-lk) will occur on the core when installed on SUSE

The following errors will occur:

Because of the way SUSE runs shutdown scripts (versus other Linux distributions), the following scripts are moved to another location after installation, so LifeKeeper will be shut down when changing run levels or rebooting. These should be the only errors that occur when verifying the steeleye-lk package.

Missing    /etc/rc.d/rc0.d/K01lifekeeper

Missing    /etc/rc.d/rc1.d/K01lifekeeper

Missing    /etc/rc.d/rc6.d/K01lifekeeper

lklin00002100

Core - Language Environment Effects 

Some LifeKeeper scripts parse the output of Linux system utilities and rely on certain patterns in order to extract information.  When some of these commands run under non-English locales, the expected patterns are altered and LifeKeeper scripts fail to retrieve the needed information.  For this reason, the language environment variable LC_MESSAGES has been set to the POSIX “C” locale (LC_MESSAGES=C)  in/etc/default/LifeKeeper.  It is not necessary to install Linux with the language set to English (any language variant available with your installation media may be chosen); the setting of LC_MESSAGES in /etc/default/LifeKeeper will only influence LifeKeeper.  If you change the value of LC_MESSAGES in /etc/default/LifeKeeper, be aware that it may adversely affect the way LifeKeeper operates.  The side effects depend on whether or not message catalogs are installed for various languages and utilities and if they produce text output that LifeKeeper does not expect. 

lklin00004392

Core - Shutdown hangs on SLES10 systems

When running shutdown on an AMD64 system with SLES10, the system locks up and the shutdown does not complete.  This has been reported to Novell via bug #294787.  The lockup appears to be caused by the SLES10 powersave package.

Workaround: Remove the SLES10 powersave package to enable shutdown to complete successfully.

lklin00004276

GUI - GUI login prompt may not re-appear when reconnecting via a web browser after exiting the GUI

When you exit or disconnect from the GUI applet and then try to reconnect from the same web browser session, the login prompt may not appear.

Workaround: Close the web browser, re-open the browser and then connect to the server.  When using the Firefox browser, close all Firefox windows and re-open.

lklin00004181

GUI - lkGUIapp on RHEL5 reports unsupported theme errors

When you start the GUI application client, you may see the following console message:

/usr/share/themes/Clearlooks/gtk-2.0/gtkrc:60: Engine "clearlooks" is unsupported, ignoring

This message comes from the RHEL 5 and FC6 Java platform look and feel and will not adversely affect the behavior of the GUI client.

lklin00004972

Data Replication - GUI does not show proper state on SLES 10 SP2 system

On SLES 10 SP2, netstat is broken due to a new format in /proc/<PID>/fd.  This issue is due to a SLES 10 SP2 kernel bug and has been fixed in kernel update version 2.6.16.60-0.23. 

Solution:  Please upgrade to kernel version 2.6.16.60-0.23 if running on SLES 10 SP2.

Bug 1138

Data Replication - Size limitation on 32-bit machines

When trying to replicate a drive larger than 2 TB on a 32-bit machine, the following error may occur:

Negotiation: ..Error: Exported device is too big for me. Get 64-bit machine

Solution:  If using SteelEye DataKeeper on a 32-bit machine, you cannot replicate a driver that is greater than 2 TB in size.

© 2012 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.