About
Scripts for automating automating backup and recovery tasks for PostgreSQL Database Server. The following operations have been implemented:
- Sending archive logs to backup server.
- Retrieval of archive logs from backup server.
- Initialization of the database.
- Generation of hot backups with archive logging.
- Recovery from hot backup.
- Dump (export) and restore (import) of databases.
- Initial synchronization of database servers for setting up replication.
- checking the replication between the primary and secondary PostgreSQL Servers in a Streaming Replication setup.
Sample configuration files have been provided for configuring the PostgreSQL Database Server for archive logging, recovery using archive logs and Streaming Replication.
The scripts were tested using PostgreSQL 9.0 on CentOS 5 and the scripts and the sample configuration files will probably require some adjustment for other Operating Systems or Linux Distributions and PostgreSQL versions.
For information on other projects you can check my GitHub Personal Page and GitHub Profile.
Download
You can download the latest development version of this code that is hosted at GitHub either in ZIP or TAR format.
You can also get the latest development version of the code by cloning the Git repository for the project by running:
git clone git://github.com/aouyar/pg_manage
Prerrequisites
Servers
This document asumes two PostgreSQL Database servers; the primary server and the secondary (standby) server:
Role | Hostname | IP |
---|---|---|
Primary | node1 | 10.0.0.1 |
Secondary | node2 | 10.0.0.2 |
User
The scripts must be executed by the user that is the owner of the PostgreSQL installation, which is postgres by default.
Passwordless Login Between Servers
SSH Public Key Authentication must be configured for passwordless logins between the user accounts for postgres in the two PostgreSQL Servers.
In case the SSH protocol will be used to transfer backups to and from the Backup Server, passwordles login from postgres user account in the two PostgreSQL Servers to the Backup Server.
Directory Hierarchy on the Backup Server
The backups of the database and the archive logs are stored on the Backup Server. A separate directory is required for storing:
- Archive Logs (Default: pg_archlog)
- Hot Backups (Default: pg_hotbackup)
- Exports / Dumps (Default: pg_export)
Each of these directories must have a subdirectory for each PostgreSQL Server and the name of the subdirectories must match the hostnames of the servers:
+---backup---+---pg_archlog-----+---node1
| |
| +---node2
|
+---pg_hotbackup---+---node1
| |
| +---node2
|
+---pg_export------+---node1
|
+---node2
Database Permissions
The postgres database user in each PostgreSQL Server must be configured with a password and must have replication privileges on the other server. Example entries for pg_hba.conf configuration file:
# Streaming Replication
host replication postgres 10.0.0.1/32 md5
host replication postgres 10.0.0.2/32 md5
You can check the sample pg_hba.conf file in the conf folder for a sample configuration file.
Installation
The scripts pg_manage.pl and pg_check_replication.sh must be copied to /usr/local/sbin or /usr/local/bin and must have permissions for execution.
Configuration
Configuration File
The configuration for the backup / recovery scripts are defined in the file pg_manage.conf. This configuration file must be placed in the base directory for PostgreSQL database, which is /var/lib/pgsql/9.0 for PostgreSQL 9.0 on CentOS 5.
Configuration File Options
Option | Description |
---|---|
SERVERTYPE | Server type: PRIMARY for primary server, SECONDARY for secondary (standby) server. |
PRIMARYHOST | IP Address of primary server. |
SECONDARYHOST | IP Address of secondary server. |
BACKUPURLHOT | URL designating location for Hot Backups on Backup Server. |
BACKUPURLARCH | URL designating location for Archive Logs on Backup Server. |
BACKUPURLDUMP | URL designating location for Database Dumps on Backup Server. |
BACKUPDIRHOT | Subdirectory at backup location for Hot Backups. |
BACKUPDIRARCH | Subdirectory at backup location for Archive Logs. |
BACKUPDIRDUMP | Subdirectory at backup location for Database Dumps. |
URL Format
Type | Example | Description |
---|---|---|
nfs cifs |
nfs:///home/nas cifs:///home/nas |
Mount point for remote filesystem. |
file | file:///home/nas | Mount point for local filesystem. |
ssh | ssh://user@10.0.0.100/backup | Specifies the IP of a remote SSH server, the user to be used for passwordless SSH logins to this server and the directory on server. |
Sample Configuration Files
You can check the sample pg_manage.conf.primary file in the conf folder for a sample configuration file for the primary server:
SERVERTYPE=PRIMARY # Server Type (PRIMARY or SECONDARY)
PRIMARYHOST=10.0.0.1 # Primary Server IP in Warm Standby Setup.
SECONDARYHOST=10.0.0.2 # Secondary Server IP in Warm Standby Setup.
BACKUPURLHOT=cifs:///home/nas # Destination for Hot Backups
BACKUPURLARCH=cifs:///home/nas # Destination for Archive Logs
BACKUPURLDUMP=cifs:///home/nas # Destination for Dumps
BACKUPDIRHOT=pg_hotbackup
BACKUPDIRARCH=pg_archlog
BACKUPDIRDUMP=pg_export
You can check the sample pg_manage.conf.secondary file in the conf folder for a sample configuration file for the secondary server:
SERVERTYPE=SECONDARY # Server Type (PRIMARY or SECONDARY)
PRIMARYHOST=10.0.0.1 # Primary Server IP in Warm Standby Setup.
SECONDARYHOST=10.0.0.2 # Secondary Server IP in Warm Standby Setup.
BACKUPURLHOT=cifs:///home/nas # Destination for Hot Backups
BACKUPURLARCH=cifs:///home/nas # Destination for Archive Logs
BACKUPURLDUMP=cifs:///home/nas # Destination for Dumps
BACKUPDIRHOT=pg_hotbackup
BACKUPDIRARCH=pg_archlog
BACKUPDIRDUMP=pg_export
Usage
Summary of Options
pg_manage.pl
$ pg_manage.pl -h
Help on Usage
usage: pg_manage.pl -h
Send Archive Log to Backup Server / Retrieve Archive Log from Backup Server
usage: pg_manage.pl [-n] [-d] [-q] -c xlog_archive [-l] -f FILE -p PATH
usage: pg_manage.pl [-n] [-d] [-q] -c xlog_restore [-l] [-t N] [-i SECS]
[-I SECS] [-w SECS] -f FILE -p PATH [-r FILE]
Create / Remove Trigger File
usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_fast
usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_smart
usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_cleanup
Initialize Database
usage: pg_manage.pl [-n] [-d] [-q] -c db_destroy
Hot Backup / Restore Database
usage: pg_manage.pl [-n] [-d] [-q] -c db_hot_backup
usage: pg_manage.pl [-n] [-d] [-q] -c db_restore_backup -s HOST -b BACKUPFILE
Dump (Export) / Restore (Import) Database
usage: pg_manage.pl [-n] [-d] [-q] -c db_export -D DATABASE
usage: pg_manage.pl [-n] [-d] [-q] -c db_import [-j N] -s HOST -b BACKUPFILE
Initial Sync of Servers: Primary -> Secondary
Run command only on secondary server.
usage: pg_manage.pl [-n] [-d] [-q] -c db_initial_sync
Check Primary. Return 0 if server is primary
usage: pg_manage.pl [-n] [-d] [-q] -c check_primary
Check Secondary. Return 0 if server is secondary
usage: pg_manage.pl [-n] [-d] [-q] -c check_secondary
pg_check_replication.sh
$ pg_check_replication.sh
PRIMARY: node1
pg_current_xlog_location
--------------------------
73A/27403A08
(1 row)
SECONDARY: node2
pg_last_xlog_receive_location | pg_last_xlog_replay_location
-------------------------------+------------------------------
73A/27403A08 | 73A/27403A08
(1 row)
pg_manage.pl - Common Options
-h Print help text.
-d Debug mode.
-q Quiet mode.
-n Dry-run mode. Simulate operations instead of executing them.
Used for testing operations before executing them.
-c OPER Operation to be executed. Must be one of the following:
xlog_archive Copy Transaction Log File to Backup
Server. Used in archive_command
option in the configuration file
postgresql.conf for enabling Archive
Logging in PostgreSQL Database.
xlog_restore Restore a Transaction Log File from
Backup Server. Used in restore_command
option in the configuration file
recovery.conf for enabling recovery
using Archive Logs in PostgreSQL
Database. Used for Point-in-Time
Recovery.
db_destroy Delete data directories to prepare
the PostgreSQL Database Server for
recovery.
db_hot_backup Hot Backup of the PostgreSQL Database
Server to the Backup Server.
db_restore_backup Restore PostgreSQL Database Server
from Hot Backup stored in Backup Server.
db_export Generates a compressed dump (export)
of a database at the Backup Server.
db_import Restores a database from a compressed
dump (export) of a database stored at
the Backup Server.
db_initial_sync Initial sync of data from primary
server on secondary server. (Must be
executed only on secondary server.)
check_primary Check if the server is configured as
primary server.
check_secondary Check if the server is configured as
secondary server.
recovery_trigger_fast Generate Trigger File for
finalizing the recovery process
as soon as possible.
recovery_trigger_smart Generate Trigger File for
finalizing the recovery process
after consuming all available
Archive Log files.
recovery_trigger_cleanup Remove any existing Trigger
Files.
pg_manage.pl - Backup of Transaction Log Files - xlog_archive
Compresses Transaction Log files and copies it to the Backup Server.
Usage: pg_manage.pl [-n] [-d] [-q] -c xlog_archive [-l] -f FILE -p PATH
-l Generate log file with filename archive-DD.log in the log file
directory of PostgreSQL Server. (The log file directory for
PostgreSQL version 9.0 on RHEL / CentOS is by default
/var/lib/pgsql/9.0/data/pg_log.)
-f FILE Filename for Transaction Log File.
-p PATH Path for Transaction Log Files directory. The path for
PostgreSQL 9.0 on RHEL / CentOS is by default
/var/lib/pgsql/9.0/data/pg_xlog.
The command is specifically designed to be used in archive_command option of the configuration file postgresql.conf for enabling Archive Logging in PostgreSQL Database Server.
Sample fragment from postgresql.conf:
archive_mode = on
archive_command = '/usr/local/bin/pg_manage.pl -c xlog_archive -l -p "%p"
archive_timeout = 900
pg_manage.pl - Retrieval of Transaction Log Files - xlog_restore
Retrieves compressed Transaction Log files from Backup Server and uncompresses the files.
Usage: pg_manage.pl [-n] [-d] [-q] -c xlog_restore [-l] [-t N] [-i SECS]
[-I SECS] [-w SECS] -f FILE -p PATH [-r FILE]
-l Generate log file with filename restore-DD.log in the log file
directory of PostgreSQL Server. (The log file directory for
PostgreSQL version 9.0 on RHEL / CentOS is by default
/var/lib/pgsql/9.0/data/pg_log.)
-f FILE Filename for Transaction Log File.
-p PATH Path for Transaction Log Files directory. The path for
PostgreSQL 9.0 on RHEL / CentOS is by default
/var/lib/pgsql/9.0/data/pg_xlog.
-r FILE Filename for the last Transaction Log file to preserve for
consistency of PostgreSQL Database.
-t N Number of times to retry retrieval of a Transaction Log File from
Backup Server. (Default: 3)
-i SECS Time (seconds) to wait in seconds between retries for retrieval of
a Transaction Log File from Backup Server. (Default: 10 seconds)
-I SECS Time (seconds) to wait between revisions of the Backup Server for
new Archive Log Files. (Default: 60 seconds)
-w SECS Total time (seconds) to wait for the availability of a new
Archive Log File on the Backup Server. (Default: 0 seconds)
End of recovery process will be triggered once the total time
for waiting for new Archive Log files runs out.
The value 0 implies waiting indefinitely and may be used in
replication setups with Log Shipping.
The command is specifically designed to be used in restore_command option of the configuration file recovery.conf for executing recovery using Archive Logs.
Sample recovery.conf file for recovering the database until the last transaction registered in the last available Archive Log file:
restore_command = '/usr/local/bin/pg_manage.pl -l -c xlog_restore -s node1 -I 15 -w 60 -f %f -p %p -r %r'
Sample recovery.conf file for Point-in-Time Recovery (recovering the database until a certain date and time):
restore_command = '/usr/local/bin/pg_manage.pl -l -c xlog_restore -s node1 -I 15 -w 60 -f %f -p %p -r %r'
recovery_target_time = '2011-02-02 23:59:50 COT'
Sample recovery.conf file for Replication with Log Shipping:
restore_command = '/usr/local/bin/pg_manage.pl -l -c xlog_restore -s node1 -I 15 -w 0 -f %f -p %p -r %r'
Sample recovery.conf file for Streaming Replication:
restore_command = '/usr/local/bin/pg_manage.pl -l -c xlog_restore -s node1 -I 15 -w 60 -f %f -p %p -r %r'
standby_mode = 'on'
primary_conninfo = 'host=10.0.0.1 port=5432 user=postgres password=xxxxxxxx'
trigger_file = '/var/lib/pgsql/9.0/data/recovery.trigger'
pg_manage.pl - End Recovery Process - recovery_trigger_(fast|smart)
Generate Trigger File for finalizing the recovery process as soon as possible:
Usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_fast
Generate Trigger File for finalizing the recovery process after consuming all available Archive Log files:
Usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_smart
Remove any existing Trigger Files:
Usage: pg_manage.pl [-n] [-d] [-q] -c recovery_trigger_cleanup
pg_manage.pl - Initialize Database - db_destroy
Delete data directories to prepare the PostgreSQL Database Server for recovery.
Usage: pg_manage.pl [-n] [-d] [-q] -c db_destroy
The command deletes data directories of PostgreSQL Server. In case there are
no other copies, this might result in a total loss of data.
pg_manage.pl - Hot Backup - db_hotbackup
Executes a Hot Backup of the PostgreSQL Database if the database is online, executes a Cold Backup if the database is offline. The backup is stored in the Backup Server in tgz format and contains a compressed copy of the data files of the database.
Usage: pg_manage.pl [-n] [-d] [-q] -c db_hotbackup
Successful recovery from a Hot Backup depends on the availability of Archive Logs. The Hot Backup will not be of any use unless the backups of Archive Logs or also stored in the Backup Server.
pg_manage.pl - Recovery from Hot/Cold Backup - db_restore_backup
Executes recovery of PostgreSQL Database from a Hot or Cold backup file stored on the Backup Server.
Usage: pg_manage.pl [-n] [-d] [-q] -c db_restore_backup -s HOST -b BACKUPFILE
-s HOST Hostname for PostgreSQL Server of origin, where the
Hot/Cold Backup was generated originally.
-b BACKUPFILE Filename for Hot/Cold Backup file on Backup Server.
Command that can be executed to recover from a Hot / Cold backup of the PostgreSQL Server node1:
$ pg_manage.pl -c db_restore_backup -s node1 -b hotbackup-node1-2010-10-08.tgz
The recovery process begins with the initialization of the local PostgreSQL Database, at which point all existing data in the local database will be lost. Once the recovery process ends the local database would be overwritten with data from the backup.
Successful recovery from a Hot Backup depends on the availability of Archive Logs. The Hot Backup will not be of any use unless the backups of Archive Logs are available on the Backup Server.
pg_manage.pl - Export (Dump) of a Database - db_export
Generates a Compressed Database Dump ( Export ) of a database at the Backup Server.
Usage: pg_manage.pl [-n] [-d] [-q] -c db_export -D DATABASE
-D DATABASE Name of the database that will be exported.
Command that can be executed to perform a dump (export) of the database dbapp:
$ pg_manage.pl -c db_export -D dbapp
pg_manage.pl - Restore a Database from a Database Dump - db_import
Restores a database from a Compressed Database Dump (export) of a database stored at the Backup Server.
Usage: pg_manage.pl [-n] [-d] [-q] -c db_import [-j N] -s HOST -b BACKUPFILE
-s HOST Hostname for PostgreSQL Server of origin, where the
Database Dump (Export) was generated originally.
-b BACKUPFILE Filename for the Database Dump (Export) file on Backup
Server.
-j N Number of concurrent restore jobs to run to speed up recovery.
(Default: 1) The recommended value for minimizing the time
required for the recovery process is to set this parameter
to the number of processor cores in server.
Command that can be executed to restore a database from a dump (export) of the database dbapp that was generated from node1:
$ pg_manage.pl -c db_import -j 8 -s node1 -b node1-dbapp-bin-2011-02-07.dump
The recovery process begins with the creation of the database. The recovery process will fail, if there is a database with the same name, unless the existing database is dropped or renamed before executing the import..
pg_manage.pl - Initial Sync of Databases for Replication - db_initial_sync
The command must only be executed on the Secondary Server for performing an initial copy of data from the Primary Server.
Usage: pg_manage.pl [-n] [-d] [-q] -c db_initial_sync
The initial replication process must be executed only on the secondary server. The pg_manage.conf configuration file in both the primary and secondary PostgreSQL servers must checked to confirm that the roles of both servers are configured correctly, before executing the initial replication process.
The replication process begins with the initialization of the local PostgreSQL Database, at which point all existing data in the local database will be lost. Once the replication process ends the local database would be overwritten with data from the primary servcer.
The success of the replication process depends on the availability of Archive Logs. The replication process may fail unless the backups of Archive Logs or also stored in the Backup Server.
pg_manage.pl - Verify Server Role - check_primary / check_secondary
The check_primary command returns 0 ( true ) if the server is configured as Primary Server, 1 ( false ) otherwise.
Usage: pg_manage.pl [-n] [-d] [-q] -c check_primary
The check_secondary command returns 0 ( true ) if the server is configured as Secondary Server, 1 ( false ) otherwise.
Usage: pg_manage.pl [-n] [-d] [-q] -c check_primary
License
pg_manage is copyrighted free software made available under the terms of the GPL License Version 3 or later.
See the file COPYING that acompanies the code for full licensing information.
TODO
- Migrate the scripts to Python.
- Implement the pg_basebackup command to take advantage of the Base Backup funcionality in PostgreSQL 9.1.
- Adapt the scripts to other Linux distributions and other versions of PostgreSQL database.
- Avoid overriding of existing backup files.
- Improve the documentation of the scripts.