You are here: Troubleshooting > Operational Messages

Operational Messages

The following messages commonly occur when LifeKeeper is operating; an explanation follows the message.

Common LifeKeeper Messages

LIFEKEEPER IS SHUTTING DOWN AT: day month date time year

LifeKeeper is no longer protecting the server this message came from.

LifeKeeper: RESOURCE PROTECTION ACTIVATED FOR system_name AT: day month date time year

LifeKeeper is now protecting the server this message came from.

COMMUNICATION TO system_name BY device_name FAILED AT: day month date time year

The communication link, between the server this message came from and the server indicated in the message, is no longer active. The operator should investigate and restore communications.

FAILOVER RECOVERY OF MACHINE system_name STARTED AT: day month date time

The server identified in the message has failed, and LifeKeeper is initiating a failover to the server this message came from. The operator should investigate reasons for the server failure.

Common SCSI Communication Messages

LifeKeeper communication with the SCSI driver is abstracted by the LifeKeeper libLKscsi library into a consistent interface across all platforms.  With one exception, all LifeKeeper accesses to the libLKscsi library will log error messages. The exception is during the start of LifeKeeper (/opt/LifeKeeper/bin/lkstart), when all possible devices are being scanned.  Each device that does not exist would generate an error. Since LifeKeeper cannot distinguish in this case a device that does not exist from one that is not configured correctly, it does not log any errors to avoid filling the logs with hundreds of useless messages.

Each error message below will be in the format:

Device (H,C,ID,LUN): <message>

where H is the host adapter number, C is the controller number, ID is the SCSI ID, and LUN is the Logical Unit Number followed by the message.

ERROR: SCSI reservation conflict during LifeKeeper resource initialization. Manual intervention required.

This message means that when LifeKeeper was started it could not bring resources in-service because the resources are reserved by another system. This is a unique situation that during LifeKeeper initialization when it gets this error it can not reliably determine the state of the cluster. This situation should only occur where a system was abnormally shutdown (where LifeKeeper did not properly close its resources) or when there are communication failures such that there are no working communications between the servers.  This is typically called a "split-brain" problem or a segmented cluster.  Either through the GUI or the command line interface the resource can manually be brought in service on the proper server.

ERROR: scsi/disk unable to find device previously called $oldval.

ERROR:This could be because the device is no longer connected

ERROR: or because you have a SCSI subsystem initialization error.

ERROR: You may be able to fix this by rebooting the system.

This message means that when LifeKeeper was started, it could not bring a disk or device resource in-service because a device which was previously configured is no longer responding to the Linux operating system. Rebooting the system may resolve this problem.

Note: An example of the value for $oldvalmight be /dev/sdc.

WARNING: additional Inquiry data available.  Requested X bytes, Y available.

LifeKeeper requested X bytes of data from the device but the device has Y bytes available. There are no "certified" devices that return more data than we request.  In most cases this should not cause a problem. The unlikely manner in which this warning can cause an observable failure is where a device returns its unique ID past where we are requesting data.  In this case LifeKeeper will not allow resources to be created on this device.  Report this to SIOS Technology Corp. for resolution.

WARNING: libLKscsi %d second timeout (write), cmd:0x%x, errno=%d, continue to wait.

When using the sg driver read/write interface to issue a reservation the system is taking a long time to complete the IO.  This message means the IO is taking much longer than expected to complete.  LifeKeeper will continue to poll for the IO to complete and if we exceed our timeout limit the command will fail with the ERROR listed below with a "fail command" message.  The default operation for LifeKeeper is to use the ioctl interface which does not have this problem (controlled by the variable "RESERVATIONS" in /etc/default/LifeKeeper).  If this configuration requires the use of the sg read/write interface then lessoning the load on the system is the only alternative to avoid this problem.

The "write" does not mean that a physical write is being done to the device but rather means that the driver was "writing" a command to the sg driver.  By writing a command to the sg read/write interface the particular command is sent across the SCSI bus to the device.  The particular commands that are sent is either a Test Unit Ready or a Reservation.

ERROR: libLKscsi %d second timeout (write), cmd:0x%x, status=%d, errno=%d, fail command.

The sg driver was not able to issue the IO.  Either the driver "lost" the IO or more likely the system is so busy it was never able to get the IO dispatched. This failure will often result in LifeKeeper doing a sendevent or HALT due to the inability to access the resource.  The use of the ioctl interface does not use the sg read/write interface so does not have this problem.  The only two solutions is to decrease the load on the system to avoid such a slow response or to use the ioctl interface.

The "write" does not mean that a physical write is being done to the device but rather means that the driver was "writing" a command to the sg driver.  By writing a command to the sg read/write interface the particular command is sent across the SCSI bus to the device.  The particular commands that are sent is either a Test Unit Ready or a Reservation.  The actual command is being returned to the calling routine as a failure with the IO not sent to the device.

The status is the last return status from the write command.  The errno will only be valid if the status is a -1.

ERROR: libLKscsi write failure, cmd:0x%x, retry count=%d, result=%d,status=%d, errno=%d

The write of the cmd to the sg read/write interface failed with status given and if the status is -1 then the errno is valid.  The result is a driver specific result from the sg driver in the format:

DDHHMMSS

where:

DD - Driver byte (mid-level driver specific)

HH - Host byte (low level driver specific)

MM - Message byte from SCSI Bus

SS - Status Byte from SCSI Bus

WARNING: libLKscsi %d second timeout (read), cmd:0x%x, errno=%d, continue to wait

When using the sg driver read/write interface, the read to get the status from the command (cmd) is taking much longer than expected to complete. LifeKeeper will continue to poll for the IO to complete and if we exceed our timeout limit the command will fail with the ERROR listed below with a"fail command" message.  

The default operation for LifeKeeper is to use the ioctl interface which does not have this problem (controlled by the variable "RESERVATIONS" in /etc/default/LifeKeeper). If this configuration requires the use of the sg read/write interface then lessoning the load on the system is the only alternative to avoid this problem.

The "read" does not mean that a physical read is being done to the device but rather means that the driver was "reading" the completion status for a command that was previously "written" to the sg driver.  

ERROR: libLKscsi %d second timeout (read), cmd:0x%x, errno=%d, fail command

The sg driver was not able to get the status for the previous "write".  Either the driver "lost" the IO or more likely the system is so busy it was never able to get the IO dispatched. This failure will often result in LifeKeeper doing a sendevent or HALT due to the inability to access the resource.  The use of the ioctl interface does not use the sg read/write interface so does not have this problem.  The only two solutions are to decrease the load on the system to avoid such a slow response or to use the ioctl interface.

The "read" does not mean that a physical read is being done to the device but rather means that the driver was "reading: the status from the previous write to the sg driver.  The actual command is being returned to the calling routine as a failure but the actual write to the sg driver may still be pending and may still complete.

The status is the last return status from the read command.  The errno will only be valid if the status is a -1.

ERROR: libLKscsi read failure cmd:0x%x, retry count=%d, errno=%d

The read of the cmd to the sg read/write interface failed with status given and if the status is -1 then the errno is valid.  The result is a driver specific result from the sg driver in the format:

DDHHMMSS

where:

DD - Driver byte (mid-level driver specific)

HH - Host byte (low level driver specific)

MM - Message byte from SCSI Bus

SS - Status Byte from SCSI Bus

ERROR Messages Enabled by Default

LKSCSI_Open: unable to map /dev/sdX to a generic node

LifeKeeper needs to use the generic driver (sg driver) in order to be able to use an ioctl or read/write interface to issue commands directly to a SCSI device.  These commands would include Inquiry, Test Unit Ready, Reserve, Release, etc.  The disk device node (/dev/sd) is the normal interface used to access disk devices so LifeKeeper maps the disk device node to the generic device node (/dev/sg).  This message indicates that LifeKeeper was unable to map the disk device /dev/sdX to a generic node.  Make sure there are at least as many generic device nodes created as there are SCSI disk, SCSI tape and SCSI CD-ROM devices configured in the system.

LKSCSI_RawCDB: Invalid Command Length passed in CDB (%d). Only default size commands can be used with Linux driver.

Using the LifeKeeper libLKscsi interface a command was issued where the command length did not match the length defined by the SCSI Standard. The typical command lengths are defined to by 6, 10 and 12 bytes commands.  Make sure the correct command length is used when sending a command using the libLKscsi interface.

If this error message is returned from LifeKeeper then Customer Support should be contacted.

© 2012 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.