Restore Fails

Symptom:

The Restore of a PostgreSQL Server resource will fail after switchover if the database cluster data directory access permissions are configured incorrectly.

Suggested Action:

Verify the access permissions are configured correctly. See step 3 in Creating and Protecting Additional PostgreSQL Database Clusters.

Symptom:

The Restore of a PostgreSQL Server resource will fail if the database cluster instance is started via pg_ctl.exe start and not via an in service action in LifeKeeper or via a service start via Windows APIs. Using pg_ctl.exe to start the database cluster creates an inconsistency in the Windows Service state causing a LifeKeeper restore to fail on the attempt to start an already running instance.

When attempting to start an already running instance, PostgreSQL will log the following messages:

FATAL: lock file “postmaster.pid” already exists

HINT: Is another postmaster (PID 3488) running in data directory “E:/PGSQL1”?

Suggested Action:

To correct this condition the database cluster must be stopped via pg_ctl stop. Once the stop completes the LifeKeeper in service action should be successful.

Symptom:

The Restore of a PostgreSQL Server resource can fail if the database cluster did not shut down cleanly because of server crash or the PostgreSQL service was hung when the shutdown occurred (windbg was used to simulate a hang). The inability to shutdown cleanly will force a database cluster recovery action on the next startup. This recovery action can cause the Window’s Service start action to fail placing the service in an inconsistent state with the database cluster state. During startup after a unclean shutdown, PostgreSQL may log the following messages (along with a number of others):

Waiting for server start up

LOG: database system was interrupted; last known up at 2017-07-25 16:12:10 EDT

FATAL: the database system is starting up

LOG: database system was not properly shut down; automatic recovery in progress

Once the recovery is complete the PostgreSQL database cluster processes are running but the Window’s Service state is “Stopped” and the LifeKeeper PostgreSQL resource is in the failed state. If a LifeKeeper restore action is attempted with the database cluster up and running, PostgreSQL will log the following messages:

FATAL: lock file “postmaster.pid” already exists

HINT: Is another postmaster (PID 3488) running in data directory “E:/PGSQL1”?

Suggested Action:

To correct this condition the database cluster must be stopped via pg_ctl stop once the recovery is complete. Once the stop completes the LifeKeeper in service action should be successful.

Create Fails

Tunable Settings for the PostgreSQL Recovery Kit

Feedback

Post your comment on this topic.

Symptom:

Suggested Action:

Symptom:

Suggested Action:

Symptom:

Suggested Action:

Feedback

Was this helpful?