About

Scripts for automating automating backup and recovery tasks for PostgreSQL Database Server. The following operations have been implemented:

  • Sending archive logs to backup server.
  • Retrieval of archive logs from backup server.
  • Initialization of the database.
  • Generation of hot backups with archive logging.
  • Recovery from hot backup.
  • Dump (export) and restore (import) of databases.
  • Initial synchronization of database servers for setting up replication.
  • checking the replication between the primary and secondary PostgreSQL Servers in a Streaming Replication setup.

Sample configuration files have been provided for configuring the PostgreSQL Database Server for archive logging, recovery using archive logs and Streaming Replication.

The scripts were tested using PostgreSQL 9.0 on CentOS 5 and the scripts and the sample configuration files will probably require some adjustment for other Operating Systems or Linux Distributions and PostgreSQL versions.

For information on other projects you can check my GitHub Personal Page and GitHub Profile.

Download

You can download the latest development version of this code that is hosted at GitHub either in ZIP or TAR format.

You can also get the latest development version of the code by cloning the Git repository for the project by running:

git clone git://github.com/aouyar/pg_manage

Prerrequisites

Servers

This document asumes two PostgreSQL Database servers; the primary server and the secondary (standby) server:

Role Hostname IP
Primary node1 10.0.0.1
Secondary node2 10.0.0.2

User

The scripts must be executed by the user that is the owner of the PostgreSQL installation, which is postgres by default.

Passwordless Login Between Servers

SSH Public Key Authentication must be configured for passwordless logins between the user accounts for postgres in the two PostgreSQL Servers.

In case the SSH protocol will be used to transfer backups to and from the Backup Server, passwordles login from postgres user account in the two PostgreSQL Servers to the Backup Server.

Directory Hierarchy on the Backup Server

The backups of the database and the archive logs are stored on the Backup Server. A separate directory is required for storing:

  • Archive Logs (Default: pg_archlog)
  • Hot Backups (Default: pg_hotbackup)
  • Exports / Dumps (Default: pg_export)

Each of these directories must have a subdirectory for each PostgreSQL Server and the name of the subdirectories must match the hostnames of the servers:

+---backup---+---pg_archlog-----+---node1
             |                  |
             |                  +---node2
             |
             +---pg_hotbackup---+---node1
             |                  |
             |                  +---node2
             |
             +---pg_export------+---node1
                                |
                                +---node2

Database Permissions

The postgres database user in each PostgreSQL Server must be configured with a password and must have replication privileges on the other server. Example entries for pg_hba.conf configuration file:

# Streaming Replication
host    replication     postgres        10.0.0.1/32             md5
host    replication     postgres        10.0.0.2/32             md5

You can check the sample pg_hba.conf file in the conf folder for a sample configuration file.

Installation

The scripts pg_manage.pl and pg_check_replication.sh must be copied to /usr/local/sbin or /usr/local/bin and must have permissions for execution.

Configuration

Configuration File

The configuration for the backup / recovery scripts are defined in the file pg_manage.conf. This configuration file must be placed in the base directory for PostgreSQL database, which is /var/lib/pgsql/9.0 for PostgreSQL 9.0 on CentOS 5.

Configuration File Options

Option Description
SERVERTYPE Server type: PRIMARY for primary server, SECONDARY for secondary (standby) server.
PRIMARYHOST IP Address of primary server.
SECONDARYHOST IP Address of secondary server.
BACKUPURLHOT URL designating location for Hot Backups on Backup Server.
BACKUPURLARCH URL designating location for Archive Logs on Backup Server.
BACKUPURLDUMP URL designating location for Database Dumps on Backup Server.
BACKUPDIRHOT Subdirectory at backup location for Hot Backups.
BACKUPDIRARCH Subdirectory at backup location for Archive Logs.
BACKUPDIRDUMP Subdirectory at backup location for Database Dumps.

URL Format

Type Example Description
nfs
cifs
nfs:///home/nas
cifs:///home/nas
Mount point for remote filesystem.
file file:///home/nas Mount point for local filesystem.
ssh ssh://user@10.0.0.100/backup Specifies the IP of a remote SSH server, the user to be used for passwordless SSH logins to this server and the directory on server.

Sample Configuration Files

You can check the sample pg_manage.conf.primary file in the conf folder for a sample configuration file for the primary server:

SERVERTYPE=PRIMARY              # Server Type (PRIMARY or SECONDARY)
PRIMARYHOST=10.0.0.1            # Primary Server IP in Warm Standby Setup.
SECONDARYHOST=10.0.0.2          # Secondary Server IP in Warm Standby Setup.

BACKUPURLHOT=cifs:///home/nas   # Destination for Hot Backups
BACKUPURLARCH=cifs:///home/nas  # Destination for Archive Logs
BACKUPURLDUMP=cifs:///home/nas  # Destination for Dumps
BACKUPDIRHOT=pg_hotbackup
BACKUPDIRARCH=pg_archlog
BACKUPDIRDUMP=pg_export

You can check the sample pg_manage.conf.secondary file in the conf folder for a sample configuration file for the secondary server:

SERVERTYPE=SECONDARY            # Server Type (PRIMARY or SECONDARY)
PRIMARYHOST=10.0.0.1            # Primary Server IP in Warm Standby Setup.
SECONDARYHOST=10.0.0.2          # Secondary Server IP in Warm Standby Setup.

BACKUPURLHOT=cifs:///home/nas   # Destination for Hot Backups
BACKUPURLARCH=cifs:///home/nas  # Destination for Archive Logs
BACKUPURLDUMP=cifs:///home/nas  # Destination for Dumps
BACKUPDIRHOT=pg_hotbackup
BACKUPDIRARCH=pg_archlog
BACKUPDIRDUMP=pg_export

Usage

Summary of Options

pg_manage.pl

$ pg_manage.pl -h
Help on Usage
usage: pg_manage.pl -h

Send Archive Log to Backup Server / Retrieve Archive Log from Backup Server
usage: pg_manage.pl [-n] [-d] [-q] -c xlog_archive [-l] -f FILE -p PATH
usage: pg_manage.pl [-n] [-d] [-q] -c xlog_restore [-l] [-t N] [-i SECS] 
       [-I SECS] [-w SECS] -f FILE -p PATH [-r FILE]

Create / Remove Trigger File
usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_fast
usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_smart
usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_cleanup

Initialize Database
usage: pg_manage.pl [-n] [-d] [-q] -c db_destroy

Hot Backup / Restore Database
usage: pg_manage.pl [-n] [-d] [-q] -c db_hot_backup
usage: pg_manage.pl [-n] [-d] [-q] -c db_restore_backup -s HOST -b BACKUPFILE

Dump (Export) / Restore (Import) Database
usage: pg_manage.pl [-n] [-d] [-q] -c db_export -D DATABASE
usage: pg_manage.pl [-n] [-d] [-q] -c db_import [-j N] -s HOST -b BACKUPFILE

Initial Sync of Servers: Primary -> Secondary
Run command only on secondary server.
usage: pg_manage.pl [-n] [-d] [-q] -c db_initial_sync

Check Primary. Return 0 if server is primary
usage: pg_manage.pl [-n] [-d] [-q] -c check_primary

Check Secondary. Return 0 if server is secondary
usage: pg_manage.pl [-n] [-d] [-q] -c check_secondary

pg_check_replication.sh

$ pg_check_replication.sh 
PRIMARY: node1
 pg_current_xlog_location 
--------------------------
 73A/27403A08
(1 row)


SECONDARY: node2
 pg_last_xlog_receive_location | pg_last_xlog_replay_location 
-------------------------------+------------------------------
 73A/27403A08                  | 73A/27403A08
(1 row)

pg_manage.pl - Common Options

-h                 Print help text.
-d                 Debug mode.
-q                 Quiet mode. 
-n                 Dry-run mode. Simulate operations instead of executing them.
                   Used for testing operations before executing them.
-c OPER            Operation to be executed. Must be one of the following:
                     xlog_archive       Copy Transaction Log File to Backup 
                                        Server. Used in archive_command 
                                        option in the  configuration file 
                                        postgresql.conf for enabling Archive 
                                        Logging in PostgreSQL Database.
                     xlog_restore       Restore a Transaction Log File from 
                                        Backup Server. Used in restore_command 
                                        option in the configuration file 
                                        recovery.conf for enabling recovery 
                                        using Archive Logs in PostgreSQL 
                                        Database. Used for Point-in-Time 
                                        Recovery.
                     db_destroy         Delete data directories to prepare 
                                        the PostgreSQL Database Server for 
                                        recovery. 
                     db_hot_backup      Hot Backup of the PostgreSQL Database 
                                        Server to the Backup Server. 
                     db_restore_backup  Restore PostgreSQL Database Server 
                                        from Hot Backup stored in Backup Server. 
                     db_export          Generates a compressed dump (export) 
                                        of a database at the Backup Server. 
                     db_import          Restores a database from a compressed 
                                        dump (export) of a database stored at 
                                        the Backup Server. 
                     db_initial_sync    Initial sync of data from primary 
                                        server on secondary server. (Must be 
                                        executed only on secondary server.)
                     check_primary      Check if the server is configured as 
                                        primary server.
                     check_secondary    Check if the server is configured as 
                                        secondary server. 
                     recovery_trigger_fast     Generate Trigger File for 
                                               finalizing the recovery process 
                                               as soon as possible.
                     recovery_trigger_smart    Generate Trigger File for 
                                               finalizing the recovery process 
                                               after consuming all available 
                                               Archive Log files. 
                     recovery_trigger_cleanup  Remove any existing Trigger 
                                               Files.

pg_manage.pl - Backup of Transaction Log Files - xlog_archive

Compresses Transaction Log files and copies it to the Backup Server.

Usage: pg_manage.pl [-n] [-d] [-q] -c xlog_archive [-l] -f FILE -p PATH 

  -l       Generate log file with filename archive-DD.log in the log file 
           directory of PostgreSQL Server. (The log file directory for 
           PostgreSQL version 9.0 on RHEL / CentOS is by default 
           /var/lib/pgsql/9.0/data/pg_log.) 
  -f FILE  Filename for Transaction Log File. 
  -p PATH  Path for Transaction Log Files directory. The path for 
           PostgreSQL 9.0 on RHEL / CentOS is by default 
           /var/lib/pgsql/9.0/data/pg_xlog.

The command is specifically designed to be used in archive_command option of the configuration file postgresql.conf for enabling Archive Logging in PostgreSQL Database Server.

Sample fragment from postgresql.conf:

archive_mode = on 
archive_command = '/usr/local/bin/pg_manage.pl -c xlog_archive -l -p "%p" 
archive_timeout = 900 

pg_manage.pl - Retrieval of Transaction Log Files - xlog_restore

Retrieves compressed Transaction Log files from Backup Server and uncompresses the files.

Usage: pg_manage.pl [-n] [-d] [-q] -c xlog_restore [-l] [-t N] [-i SECS] 
       [-I SECS]  [-w SECS] -f FILE -p PATH [-r FILE]

  -l       Generate log file with filename restore-DD.log in the log file 
           directory of PostgreSQL Server. (The log file directory for 
           PostgreSQL version 9.0 on RHEL / CentOS is by default 
           /var/lib/pgsql/9.0/data/pg_log.) 
  -f FILE  Filename for Transaction Log File. 
  -p PATH  Path for Transaction Log Files directory. The path for 
           PostgreSQL 9.0 on RHEL / CentOS is by default 
           /var/lib/pgsql/9.0/data/pg_xlog.
  -r FILE  Filename for the last Transaction Log file to preserve for
           consistency of PostgreSQL Database.
  -t N     Number of times to retry retrieval of a Transaction Log File from
           Backup Server. (Default: 3)
  -i SECS  Time (seconds) to wait in seconds between retries for retrieval of 
           a Transaction Log File from Backup Server.  (Default: 10 seconds)
  -I SECS  Time (seconds) to wait between revisions of the Backup Server for
           new Archive Log Files. (Default: 60 seconds)
  -w SECS  Total time (seconds) to wait for the availability of a new 
           Archive Log File on the Backup Server. (Default: 0 seconds)
           End of recovery process will be triggered once the total time
           for waiting for new Archive Log files runs out. 
           The value 0 implies waiting indefinitely and may be used in
           replication setups with Log Shipping.

The command is specifically designed to be used in restore_command option of the configuration file recovery.conf for executing recovery using Archive Logs.

Sample recovery.conf file for recovering the database until the last transaction registered in the last available Archive Log file:

restore_command = '/usr/local/bin/pg_manage.pl -l -c xlog_restore -s node1 -I 15 -w 60 -f %f -p %p -r %r'

Sample recovery.conf file for Point-in-Time Recovery (recovering the database until a certain date and time):

restore_command = '/usr/local/bin/pg_manage.pl -l -c xlog_restore -s node1 -I 15 -w 60 -f %f -p %p -r %r'
recovery_target_time = '2011-02-02 23:59:50 COT'

Sample recovery.conf file for Replication with Log Shipping:

restore_command = '/usr/local/bin/pg_manage.pl -l -c xlog_restore -s node1 -I 15 -w 0 -f %f -p %p -r %r'

Sample recovery.conf file for Streaming Replication:

restore_command = '/usr/local/bin/pg_manage.pl -l -c xlog_restore -s node1 -I 15 -w 60 -f %f -p %p -r %r' 
standby_mode = 'on' 
primary_conninfo = 'host=10.0.0.1 port=5432 user=postgres password=xxxxxxxx' 
trigger_file = '/var/lib/pgsql/9.0/data/recovery.trigger' 

pg_manage.pl - End Recovery Process - recovery_trigger_(fast|smart)

Generate Trigger File for finalizing the recovery process as soon as possible:

Usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_fast

Generate Trigger File for finalizing the recovery process after consuming all available Archive Log files:

Usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_smart

Remove any existing Trigger Files:

Usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_cleanup

pg_manage.pl - Initialize Database - db_destroy

Delete data directories to prepare the PostgreSQL Database Server for recovery.

Usage: pg_manage.pl [-n] [-d] [-q] -c db_destroy

The command deletes data directories of PostgreSQL Server. In case there are no other copies, this might result in a total loss of data.

pg_manage.pl - Hot Backup - db_hotbackup

Executes a Hot Backup of the PostgreSQL Database if the database is online, executes a Cold Backup if the database is offline. The backup is stored in the Backup Server in tgz format and contains a compressed copy of the data files of the database.

Usage: pg_manage.pl [-n] [-d] [-q] -c db_hotbackup

Successful recovery from a Hot Backup depends on the availability of Archive Logs. The Hot Backup will not be of any use unless the backups of Archive Logs or also stored in the Backup Server.

pg_manage.pl - Recovery from Hot/Cold Backup - db_restore_backup

Executes recovery of PostgreSQL Database from a Hot or Cold backup file stored on the Backup Server.

Usage: pg_manage.pl [-n] [-d] [-q] -c db_restore_backup -s HOST -b BACKUPFILE

  -s HOST        Hostname for PostgreSQL Server of origin, where the 
                 Hot/Cold Backup was generated originally.
  -b BACKUPFILE  Filename for Hot/Cold Backup file on Backup Server. 

Command that can be executed to recover from a Hot / Cold backup of the PostgreSQL Server node1:

$ pg_manage.pl -c db_restore_backup -s node1 -b hotbackup-node1-2010-10-08.tgz 

The recovery process begins with the initialization of the local PostgreSQL Database, at which point all existing data in the local database will be lost. Once the recovery process ends the local database would be overwritten with data from the backup.

Successful recovery from a Hot Backup depends on the availability of Archive Logs. The Hot Backup will not be of any use unless the backups of Archive Logs are available on the Backup Server.

pg_manage.pl - Export (Dump) of a Database - db_export

Generates a Compressed Database Dump ( Export ) of a database at the Backup Server.

Usage: pg_manage.pl [-n] [-d] [-q] -c db_export -D DATABASE

  -D DATABASE  Name of the database that will be exported.

Command that can be executed to perform a dump (export) of the database dbapp:

$ pg_manage.pl -c db_export -D dbapp

pg_manage.pl - Restore a Database from a Database Dump - db_import

Restores a database from a Compressed Database Dump (export) of a database stored at the Backup Server.

Usage: pg_manage.pl [-n] [-d] [-q] -c db_import  [-j N] -s HOST -b BACKUPFILE

  -s HOST       Hostname for PostgreSQL Server of origin, where the 
                Database Dump (Export) was generated originally.
  -b BACKUPFILE Filename for the Database Dump (Export) file on Backup 
                Server. 
  -j N          Number of concurrent restore jobs to run to speed up recovery.
                (Default: 1) The recommended value for minimizing the time
                required for the recovery process is to set this parameter
                to the number of processor cores in server.

Command that can be executed to restore a database from a dump (export) of the database dbapp that was generated from node1:

$ pg_manage.pl -c db_import -j 8 -s node1 -b node1-dbapp-bin-2011-02-07.dump

The recovery process begins with the creation of the database. The recovery process will fail, if there is a database with the same name, unless the existing database is dropped or renamed before executing the import..

pg_manage.pl - Initial Sync of Databases for Replication - db_initial_sync

The command must only be executed on the Secondary Server for performing an initial copy of data from the Primary Server.

Usage: pg_manage.pl [-n] [-d] [-q] -c db_initial_sync 

The initial replication process must be executed only on the secondary server. The pg_manage.conf configuration file in both the primary and secondary PostgreSQL servers must checked to confirm that the roles of both servers are configured correctly, before executing the initial replication process.

The replication process begins with the initialization of the local PostgreSQL Database, at which point all existing data in the local database will be lost. Once the replication process ends the local database would be overwritten with data from the primary servcer.

The success of the replication process depends on the availability of Archive Logs. The replication process may fail unless the backups of Archive Logs or also stored in the Backup Server.

pg_manage.pl - Verify Server Role - check_primary / check_secondary

The check_primary command returns 0 ( true ) if the server is configured as Primary Server, 1 ( false ) otherwise.

Usage: pg_manage.pl [-n] [-d] [-q] -c check_primary

The check_secondary command returns 0 ( true ) if the server is configured as Secondary Server, 1 ( false ) otherwise.

Usage: pg_manage.pl [-n] [-d] [-q] -c check_primary

License

pg_manage is copyrighted free software made available under the terms of the GPL License Version 3 or later.

See the file COPYING that acompanies the code for full licensing information.

TODO

  • Migrate the scripts to Python.
  • Implement the pg_basebackup command to take advantage of the Base Backup funcionality in PostgreSQL 9.1.
  • Adapt the scripts to other Linux distributions and other versions of PostgreSQL database.
  • Avoid overriding of existing backup files.
  • Improve the documentation of the scripts.