Menu

Continuous WAL Archiving for Linux

Overview

This article includes step-by-step instructions for how to setup continuous WAL archiving in an OnPremise Linux environment, with a large number of tests and test results. This incremental data backup process is beneficial for customers with a large amount of test and test result data.

The instructions below will only backup incremental changes from your last backup, which is quicker and more disk-space efficient than performing a full database backup. 

To utilize continuous WAL archiving, you will:

  • Setup incremental database backup and perform a 'restore' from incremental backups if necessary.
  • Setup a 'read' replica database which will provide continuous backup and live failover capability.

Important: You will need to be a Postgres Superuser to perform the following actions. 

Edit Postgres

Edit postgresql.conf

  1. Access your Postgresql database.
    Example
    /etc/postgresql/9.6/main/postgresql.conf
  2. Archive to the backup folder /bck/pgarchive using the following commands:
    # The archive_mode must be set to on for archiving to happen.
    archive_mode = on
    # This is the command to invoke for each WAL file to be archived.
    archive_command = 'test ! -f /bck/pgarchive/%f && cp %p /bck/pgarchive/%f'
    # Ensure there is at least one WAL file for each "archive_timeout" duration, it forces the server to switch to a new WAL segment file at least that often, archive_timeout settings of a minute or so are usually reasonable
    archive_timeout = 900
    # Set high enough to leave at least one session available for the backup
    max_wal_senders = 3
    # All WAL records required for the backup must contain sufficient full-page writes
    full_page_writes = on
    # Keep around at least these many WAL files (aka segments). 16MB each; 0 disables, should be set high enough that the log is not removed before the end of the backup.
    wal_keep_segments = 64
    # The WAL level must be archive or higher for continuous WAL archiving, hot_standby or higher for standby, posible values are minimal, archive, hot_standby, logical.
    wal_level = logical

Edit pg_hba.conf

To edit the Postgres host replication, enter the following commands:

host replication postgres 127.0.0.1/32 md5
host replication postgres ::1/128 md5

Restart Postgres Service

To restart the Postgres service, use the following commands:

sudo systemctl status postgresql
sudo systemctl stop postgresql
sudo systemctl start postgresql

Base Backup

The base backup should be done weekly or daily based on the size of your database.
Note: The directory that contains the backup may exist but it should be empty. You can set the name of the backup folder to the date of the backup and let the tool create it when back up occurs.

Use the following command and enter the appropriate information for your site:
pg_basebackup --host=<host> --port=<port, default 5432> --username=<user>  --xlog --progress --pgdata=</path/to/datadir>
Example:
ubuntu@ip-10-0-102-246:~$ time sudo -u postgres pg_basebackup --host=localhost --port=5432 --username=postgres --xlog --progress --pgdata=/bck/pgbasebackup/bck1
Password:
413970/413970 kB (100%), 1/1 tablespace
real 0m5.307s
user 0m0.384s
sys 0m0.516s

Cleanup Obsolete Archive Log Files

Complete this process after the base backup process runs successfully.

  1. Use the following command to clean up the archive backup folder:
    pg_archivecleanup -d <archive/location> <oldest_kept_wal_file>
    Example:
    The example includes 3 base backups and has 3 files indicating the backup info:
    00000003000000050000004B.00000028.backup #bck1
    00000003000000050000004D.00000028.backup #bck2
    000000030000000500000051.00000028.backup #bck3
  2. List all files in the archive backup folder:
    postgres@ip-10-0-102-246:/bck$ ls -l pgarchive/
    total 163852
    -rw------- 1 postgres postgres 16777216 Jun 29 08:29 000000030000000500000048
    -rw------- 1 postgres postgres 16777216 Jun 29 08:29 000000030000000500000049
    -rw------- 1 postgres postgres 16777216 Jun 29 08:31 00000003000000050000004A
    -rw------- 1 postgres postgres 16777216 Jun 29 08:31 00000003000000050000004B
    -rw------- 1 postgres postgres 305 Jun 29 08:31 00000003000000050000004B.00000028.backup
    -rw------- 1 postgres postgres 16777216 Jun 29 08:31 00000003000000050000004C
    -rw------- 1 postgres postgres 16777216 Jun 29 08:32 00000003000000050000004D
    -rw------- 1 postgres postgres 305 Jun 29 08:32 00000003000000050000004D.00000028.backup
    -rw------- 1 postgres postgres 16777216 Jun 29 08:37 00000003000000050000004E
    -rw------- 1 postgres postgres 16777216 Jun 29 08:42 00000003000000050000004F
    -rw------- 1 postgres postgres 16777216 Jun 29 08:44 000000030000000500000050
    -rw------- 1 postgres postgres 16777216 Jun 29 08:44 000000030000000500000051
    -rw------- 1 postgres postgres 305 Jun 29 08:44 000000030000000500000051.00000028.backup
  3. Delete the archive log files older than backup#2 (bck2):
    postgres@ip-10-0-102-246:/bck$ pg_archivecleanup -d /bck/pgarchive 00000003000000050000004D.00000028.backup
    pg_archivecleanup: keep WAL file "/bck/pgarchive/00000003000000050000004D" and later
    pg_archivecleanup: removing file "/bck/pgarchive/000000030000000500000049"
    pg_archivecleanup: removing file "/bck/pgarchive/00000003000000050000004C"
    pg_archivecleanup: removing file "/bck/pgarchive/000000030000000500000048"
    pg_archivecleanup: removing file "/bck/pgarchive/00000003000000050000004A"
    pg_archivecleanup: removing file "/bck/pgarchive/00000003000000050000004B"
    IMPORTANT: You will need to setup a maintenance job through Windows Scheduler, or manually perform the 'Base Backup' and 'Cleanup Obsolete Archive Log Files' steps daily or weekly depending on the size of your database. 

Data Recovery

Stop Database Server

  1. Check the status of your Postgres database using the following command:
    sudo systemctl status postgresql
  2. Stop the service using the following command:
    sudo systemctl stop postgresql

Backup Current Database File

If you have the space to do so, it is recommended to copy the whole cluster data directory and any tablespaces to a temporary location in case you need them later.
Note: this precaution will require that you have enough free space on your system to hold two copies of your existing database. If there is not enough space, you should save the contents of the cluster's pg_xlog subdirectory, as it might contain logs which were not archived before the system went down.

Use the following command to create a new folder which will contain all current data files before recovery:

mv /var/lib/postgresql/9.6/main /var/lib/postgresql/9.6/main.before_recovery

Restore the Database Files from your File System Backup

Be sure that they are restored with the right ownership (the database system user, not root!) and with the right permissions. If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored.

Remove any files present in pg_xlog/ and copy all WAL segments (archive log files) into pg_xlog/

In this example, we will use the base backup #3 (bck3) to restore.

  1. Copy from base backup, using the following command:
    cp -rp /bck/pgbasebackup/bck3 /var/lib/postgresql/9.6/main
  2. Cleanup logs in pg_xlog, using the following command:
    rm -rf /var/lib/postgresql/9.6/main/pg_xlog/*

Create New recovery.conf File

There are two options for data recovery:

  1. restore to the most recent state, use only one parameter in recovery.conf file:
    restore_command: tells PostgreSQL how to retrieve archived WAL file segments
  2. restore to the specific point in time, use 2 parameters in recovery.conf file:
    restore_command: tells PostgreSQL how to retrieve archived WAL file segments
    recovery_target_time: specifies the time stamp up to which recovery will proceed

You can get a sample file from: c:\Program Files\PostgreSQL\9.6\share\recovery.conf.sample

OR

Create a new empty file recovery.conf: /var/lib/postgresql/9.6/main/recovery.conf

  1. Save this content in that file to restore to a specific point in time, using the following command:
    restore_command = 'cp /bck/pgarchive/%f %p'
    recovery_target_time = '2018-07-02 05:50:00.0'

  2. Or use only one parameter restore_command to restore to the most recent state, using the following command:
    restore_command = 'cp /bck/pgarchive/%f %p'

Start the Database Server to Perform Recovery

Start the server, which will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery.

Upon completion of the recovery process, the server will rename recovery.conf to recovery.done (to prevent accidentally re-entering recovery mode later) and then commence normal database operations.

  1. To start the database, use the following command:
    sudo service postgresql start
  2. Review your log file 'postgres.log,' as seen in the example below:
    -07-03 06:41:52 UTC [1634-852] LOG: restored log file "00000003000000080000009F" from archive
    2018-07-03 06:41:52 UTC [1634-853] LOG: restored log file "0000000300000008000000A0" from archive
    2018-07-03 06:41:52 UTC [1634-854] LOG: restored log file "0000000300000008000000A1" from archive
    2018-07-03 06:41:53 UTC [1634-855] LOG: restored log file "0000000300000008000000A2" from archive
    2018-07-03 06:41:53 UTC [1634-856] LOG: restored log file "0000000300000008000000A3" from archive
    2018-07-03 06:41:53 UTC [1634-857] LOG: restored log file "0000000300000008000000A4" from archive
    2018-07-03 06:41:53 UTC [1634-858] LOG: restored log file "0000000300000008000000A5" from archive
    2018-07-03 06:41:54 UTC [1634-859] LOG: restored log file "0000000300000008000000A6" from archive
    2018-07-03 06:41:54 UTC [1634-860] LOG: restored log file "0000000300000008000000A7" from archive
    2018-07-03 06:41:54 UTC [1634-861] LOG: restored log file "0000000300000008000000A8" from archive
    2018-07-03 06:41:54 UTC [1634-862] LOG: recovery stopping before commit of transaction 13924, time 2018-07-02 07:48:10.279609+00
    2018-07-03 06:41:54 UTC [1634-863] LOG: redo done at 8/A801D758
    2018-07-03 06:41:54 UTC [1634-864] LOG: last completed transaction was at log time 2018-07-02 05:26:23.275142+00
    2018-07-03 06:41:54 UTC [1634-865] LOG: selected new timeline ID: 4
    2018-07-03 06:41:55 UTC [1634-866] LOG: archive recovery complete
    2018-07-03 06:41:55 UTC [1634-867] LOG: MultiXact member wraparound protections are now enabled
    2018-07-03 06:41:55 UTC [1633-1] LOG: database system is ready to accept connections
    2018-07-03 06:41:55 UTC [3538-1] LOG: autovacuum launcher started

Setup Hot Replication

These instructions will use the example server information below:

  • Master Server IP: 10.0.102.246 and is configured using continuous WAL archiving as above.
  • Standby Server IP: 10.0.102.57

Master Server Configuration

  1. Edit the  pg_hba.conf to allow access from the standby server.
    Example:
    host    replication     postgres        10.0.102.57/32                 md5
  2. Restart the Master Server.

Standby Server Configuration

  1. Stop the database service, using the following command:
    sudo systemctl stop postgresql
  2. Move the database folder to another location.
    Example:
    mv /var/lib/postgresql/9.6/main /var/lib/postgresql/9.6/main_old
  3. Stream data from the master server to the standby server, using the following command:
    pg_basebackup -h <master_server> -p <port> -D <postgres/data/path> -U <user> -v -P --xlog-method=stream
    Example:
    ubuntu@ip-10-0-102-57:~$ sudo -u postgres pg_basebackup -h 10.0.102.246 -D /var/lib/postgresql/9.6/main -U postgres -v -P --xlog-method=stream
    Password:
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    transaction log start point: 9/C2000028 on timeline 5
    pg_basebackup: starting background WAL receiver
    397557/397557 kB (100%), 1/1 tablespace
    transaction log end point: 9/C20000F8
    pg_basebackup: waiting for background process to finish streaming ...
    pg_basebackup: base backup completed

Modify postgresql.conf

Use the following command to modify your postgresql.conf file:
hot_standby = on

Create the Recovery Configuration File recovery.conf

  • You can get a sample file from: /usr/share/postgresql/9.6/recovery.conf.sample
    OR
  • Create a new empty file recovery.conf: /var/lib/postgresql/9.6/main/recovery.conf

Specify the master server host with credentials in the parameter primary_conninfo, using the following command example:
standby_mode = on
primary_conninfo = 'host=10.0.102.246 port=5432 user=postgres password=XXXXXX'

Start the Standby Server

  1. Start the Standby Server, using the following command:
    sudo systemctl start postgresql
  2. Review your log file 'postgres.log,' as seen in the example below:
    2018-07-03 08:29:39.517 UTC [11849] LOG: received fast shutdown request
    2018-07-03 08:29:39.517 UTC [11849] LOG: aborting any active transactions
    2018-07-03 08:29:39.518 UTC [11854] LOG: autovacuum launcher shutting down
    2018-07-03 08:29:39.519 UTC [11851] LOG: shutting down
    2018-07-03 08:29:39.534 UTC [11849] LOG: database system is shut down
    2018-07-03 09:01:39.120 UTC [12142] LOG: database system was interrupted; last known up at 2018-07-03 08:49:20 UTC
    2018-07-03 09:01:39.157 UTC [12142] LOG: entering standby mode
    2018-07-03 09:01:39.164 UTC [12142] LOG: redo starts at 9/C2000028
    2018-07-03 09:01:39.166 UTC [12142] LOG: consistent recovery state reached at 9/C20000F8
    2018-07-03 09:01:39.166 UTC [12141] LOG: database system is ready to accept read only connections
    2018-07-03 09:01:39.176 UTC [12146] LOG: started streaming WAL from primary at 9/C3000000 on timeline 5

 Test the Hot Replication

  1. On the Master Server, create a new table and insert one record, using the following command:
    qtestop=# create table ztmprep (name text);
    CREATE TABLE
    qtestop=# insert into ztmprep(name) values ('test replication');
    INSERT 0 1
  2. Query the record on the Standby Server you just inserted on the Master Server, using the following command:
    qtestop=# select * from ztmprep;
    name
    ------------------
    test replication
    (1 row)
    Note: The Standby Server is read-only, so no DML is allowed here. Therefore an error will be received.
    Example:
    qtestop=# insert into ztmprep(name) values ('test insert in standby server');
    ERROR: cannot execute INSERT in a read-only transaction

Promote Standby Server to Master Server

Use the following command to promote the Standby Server to the Master Server:
pg_ctl promote -D </path/to/data/folder>

Example:
$pg_ctl promote -D /var/lib/postgresql/9.6/main
server promoting

Test Promoted Server

Test the Promoted Server with the following command:
testrep=# create table ztmppromote(name text);
CREATE TABLE
testrep=# insert into ztmppromote(name) values('test insert in promoted server');
INSERT 0 1

Update Your qTest Application Configuration to Use the Standby Database

Version 8.7.3 and 9.0.0

  1. Open your qtest.config file located at qtestctl/qtest.config and edit the Postgres Host value in the Postgress section using the following command:
    ...
    /* external tool configurations. these are used as shared configurations to construct applications below. */
    external {
    postgres {
    host = 'master_ip' => 'standby_ip'
    port = 5432
    auth {
    user = 'postgres'
    pass = 'root'
    }
    db {
    qtest = 'qtest'
    sessions = 'sessions'
    }
    }
    ...
    Example:
    ...
    /* external tool configurations. these are used as shared configurations to construct applications below. */
    external {
    postgres {
    host = '10.0.102.57'
    port = 5432
    auth {
    user = 'postgres'
    pass = 'root'
    }
    db {
    qtest = 'qtest'
    sessions = 'sessions'
    }
    }
    ...
  2. Restart the qtest service after you update the IP.

Version 9.1.5

  1. Execute the ./qtestctl command to open the configuration wizard then update your Postgres host.

  2. Restart the qtest service.
    Note: Read these instructions for a refresher on using the Command Line Wizard to update your Postgres Host. 

Powered by Zendesk