Menu

Continuous WAL Archiving for Windows

Overview

This article includes step-by-step instructions for how to setup continuous WAL archiving in an OnPremise Windows environment. This incremental data backup process is beneficial for customers with a large amount of test and test result data.

The instructions below will only backup incremental changes from your last backup, which is quicker and more disk-space efficient than performing a full database backup. 

To utilize continuous WAL archiving, you will:

  • Setup incremental database backup and perform a 'restore' from incremental backups if necessary.
  • Setup a 'read' replica database which will provide continuous backup and live failover capability.

Important: You will need to be a Postgres Superuser to perform the following actions. 

Edit Postgres

Edit postgresql.conf

  1. Access your Postgresql database.
    Example
    c:\Program Files\PostgreSQL\9.6\data\postgresql.conf
  2. Archive to the backup folder D:\pgarchive using the following commands:
    # The archive_mode must be set to on for archiving to happen.
    archive_mode = on
    # This is the command to invoke for each WAL file to be archived.
    archive_command = 'copy "%p" "D:\\pgarchive\\%f"'
    # Ensure there is at least one WAL file for each "archive_timeout" duration, it forces the server to switch to a new WAL segment file at least that often, archive_timeout settings of a minute or so are usually reasonable.
    archive_timeout = 900
    # Set high enough to leave at least one session available for the backup
    max_wal_senders = 3
    # All WAL records required for the backup must contain sufficient full-page writes
    full_page_writes = on
    # Keep around at least these many WAL files (aka segments). 16MB each; 0 disables, should be set high enough that the log is not removed before the end of the backup.
    wal_keep_segments = 64
    # The WAL level must be archive or higher for continuous WAL archiving, hot_standby or higher for standby, posible values are minimal, archive, hot_standby, logical.
    wal_level = logical

Edit pg_hba.conf

To edit the Postgres host replication, enter the following commands:

host replication postgres 127.0.0.1/32 md5
host replication postgres ::1/128 md5

Restart Postgres Service

To restart the Postgres service, use the following commands:

pg_ctl status -D "c:\Program Files\PostgreSQL\9.6\data\"
pg_ctl stop -D "c:\Program Files\PostgreSQL\9.6\data\"
pg_ctl start -D "c:\Program Files\PostgreSQL\9.6\data\"

Base Backup

The base backup should be done weekly or daily based on the size of your database.
Note: The directory that contains the backup may exist but it should be empty. You can set the name of the backup folder to the date of the backup and let the tool create it when back up occurs.

Use the following command and enter the appropriate information for your site:
pg_basebackup --host=<host> --port=<port, default 5432> --username=<user>  --xlog --progress --pgdata=</path/to/datadir>
Example:
c:\Program Files\PostgreSQL\9.6\bin>pg_basebackup --host=localhost --port=5432 --username=postgres --xlog --progress --pgdata=d:\pgbasebackup\bck1
Password: 52715/52715kB (100%), 1/1 tablespace

Cleanup Obsolete Archive Log Files

Complete this process after the base backup process runs successfully.

  1. Use the following command to clean up the archive backup folder:
    pg_archivecleanup -d <archive/location> <oldest_kept_wal_file>
    Example:
    The example includes 3 base backups and has 3 files indicating the backup info:
    000000010000000000000007.00000060.backup #bck1
    00000001000000000000000B.00000028.backup #bck2
    00000001000000000000000E.00000028.backup #bck3
  2. Delete the archive log files older than backup#2 (bck2)
    c:\Program Files\PostgreSQL\9.6\bin>pg_archivecleanup -d d:\pgarchive 00000001000000000000000B.00000028.backup
    pg_archivecleanup: keep WAL file "d:\pgarchive/00000001000000000000000B" and later
    pg_archivecleanup: removing file "d:\pgarchive/000000010000000000000005"
    pg_archivecleanup: removing file "d:\pgarchive/000000010000000000000006"
    pg_archivecleanup: removing file "d:\pgarchive/000000010000000000000007"
    pg_archivecleanup: removing file "d:\pgarchive/000000010000000000000008"
    pg_archivecleanup: removing file "d:\pgarchive/000000010000000000000009"
    pg_archivecleanup: removing file "d:\pgarchive/00000001000000000000000A"
  3. Delete the archive log files older than backup#3 (bck3)
    c:\Program Files\PostgreSQL\9.6\bin>pg_archivecleanup -d d:\pgarchive 00000001000000000000000E.00000028.backup
    pg_archivecleanup: keep WAL file "d:\pgarchive/00000001000000000000000E" and later
    pg_archivecleanup: removing file "d:\pgarchive/00000001000000000000000B"
    pg_archivecleanup: removing file "d:\pgarchive/00000001000000000000000C"
    pg_archivecleanup: removing file "d:\pgarchive/00000001000000000000000D"

IMPORTANT: You will need to set up a maintenance job through Windows Scheduler, or manually perform the 'Base Backup' and 'Cleanup Obsolete Archive Log Files' steps daily or weekly depending on the size of your database. 

Data Recovery

Stop Database Server

  1. Check the status of your Postgres database using the following command:
    pg_ctl status -D "c:\Program Files\PostgreSQL\9.6\data\"
  2. Stop the service using the following command:
    pg_ctl stop -D "c:\Program Files\PostgreSQL\9.6\data\"

Backup Current Database Files

If you have the space to do so, it is recommended to copy the whole cluster data directory and any tablespaces to a temporary location in case you need them later.

Note: this precaution will require that you have enough free space on your system to hold two copies of your existing database. If there is not enough space, you should save the contents of the cluster's pg_xlog subdirectory, as it might contain logs which were not archived before the system went down.

Use the following command to create a new folder which will contain all current data files before recovery:
mkdir d:\pgbasebackup\data.before_recovery
xcopy /O/X/E/H/K "c:\Program Files\PostgreSQL\9.6\data" d:\pgbasebackup\data.before_recovery
rmdir /S/Q "c:\Program Files\PostgreSQL\9.6\data"

Restore the Database Files from your File System Backup

Be sure that they are restored with the right ownership (the database system user, not root!) and with the right permissions.

To make sure you have the proper permissions on the data folder, follow these steps:

  1. Inside Windows Services Manager, check to see what user the Postgres service is being run under.
    WAL_archiving_permissions_Windows.png
  2. For any folder that you know the Postgres process needs to interact with, make sure the same user is granted at minimum a Write permission.
    Those folders would include:
    • the data folder (default at "C:\Program Files\PostgreSQL\9.5\data")
    • the archive folder (default in the user guide at "D:\pgarchive")
      WAL_data_properties_Windows.png
      IMPORTANT: the folder permission could be reset (to the least privilege) upon an action such as cut or copy, so make sure to always double check the permission before you execute any commands such as restore or starting the service.

If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored.

Remove any files present in pg_xlog/ and copy all WAL segments (archive log files) into pg_xlog/

In this example, we will use the base backup #3 (bck3) to restore.

  1. Copy from base backup, using the following command:
    mkdir "c:\Program Files\PostgreSQL\9.6\data"
    xcopy /O/X/E/H/K d:\pgbasebackup\bck4 "c:\Program Files\PostgreSQL\9.6\data"
  2. Cleanup logs in pg_xlog, using the following command:
    del /S/F/Q "c:\Program Files\PostgreSQL\9.6\data\pg_xlog"

Create New recovery.conf File

There are two options for data recovery:

  1. restore to the most recent state, use only one parameter in recovery.conf file:
    restore_command: tells PostgreSQL how to retrieve archived WAL file segments
  2. restore to the specific point in time, use 2 parameters in recovery.conf file:
    restore_command: tells PostgreSQL how to retrieve archived WAL file segments
    recovery_target_time: specifies the time stamp up to which recovery will proceed

You can get a sample file from: c:\Program Files\PostgreSQL\9.6\share\recovery.conf.sample

OR

Create a new empty file recovery.conf: c:\Program Files\PostgreSQL\9.6\data\recovery.conf

  1. Save this content in that file to restore to a specific point in time, using the following command:
    restore_command = 'copy "d:\\pgarchive\\%f" "%p"'
    recovery_target_time = '2018-07-02 05:50:00.0'

  2. Or use only one parameter restore_command to restore to the most recent state, using the following command:
    restore_command = 'copy "d:\\pgarchive\\%f" "%p"'

Start the Database Server to Perform Recovery

Start the server, which will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery.

Upon completion of the recovery process, the server will rename recovery.conf to recovery.done (to prevent accidentally re-entering recovery mode later) and then commence normal database operations.

  1. To start the database, use the following command:
    pg_ctl start -D "c:\Program Files\PostgreSQL\9.6\data\"
  2. Review your log file 'postgres.log,' as seen in the example below:
    2018-07-04 12:44:04 +07 LOG: database system was interrupted; last known up at 2018-06-29 13:59:52 +07  
    2018-07-04 12:44:15 +07 LOG: starting point-in-time recovery to 2018-07-01 18:00:00+07
    2018-07-04 12:44:15 +07 LOG: restored log file "000000010000000000000041" from archive
    2018-07-04 12:44:15 +07 LOG: redo starts at 0/41000028
    2018-07-04 12:44:15 +07 LOG: consistent recovery state reached at 0/41000130
    2018-07-04 12:44:15 +07 LOG: restored log file "000000010000000000000042" from archive
    2018-07-04 12:44:15 +07 LOG: restored log file "000000010000000000000043" from archive
    2018-07-04 12:44:16 +07 LOG: restored log file "000000010000000000000044" from archive2018-07-04 12:44:35 +07 LOG: restored log file "0000000100000000000000AA" from archive
    2018-07-04 12:44:35 +07 LOG: restored log file "0000000100000000000000AB" from archive
    2018-07-04 12:44:35 +07 LOG: restored log file "0000000100000000000000AC" from archive
    2018-07-04 12:44:35 +07 LOG: recovery stopping before commit of transaction 3777, time 2018-07-04 11:31:22.016211+07
    2018-07-04 12:44:35 +07 LOG: redo done at 0/AC020220
    2018-07-04 12:44:35 +07 LOG: last completed transaction was at log time 2018-06-29 14:08:40.140256+07
    2018-07-04 12:44:35 +07 LOG: selected new timeline ID: 2
    2018-07-04 12:44:35 +07 LOG: archive recovery complete
    2018-07-04 12:44:36 +07 LOG: MultiXact member wraparound protections are now enabled
    2018-07-04 12:44:36 +07 LOG: database system is ready to accept connections
    2018-07-04 12:44:36 +07 LOG: autovacuum launcher started

Setup Hot Replication

These instructions will use the example server information below:

  • Master Server IP: 10.0.102.246 and is configured using continuous WAL archiving as above.
  • Standby Server IP: 10.0.102.57

Master Server Configuration

  1. Edit the  pg_hba.conf to allow access from the standby server.
    Example:
    host    replication     postgres        10.0.102.57/32                 md5
  2. Restart the Master Server.

Standby Server Configuration

  1. Stop the database service, using the following command:
    pg_ctl stop -D "c:\Program Files\PostgreSQL\9.6\data\"
  2. Move the database folder to another location.
    Example:
    ren "c:\Program Files\PostgreSQL\9.6\data" "c:\Program Files\PostgreSQL\9.6\data_old"
  3. Stream data from the master server to the standby server, using the following command:
    pg_basebackup -h <master_server> -p <port> -D <postgres/data/path> -U <user> -v -P --xlog-method=stream
    Example:
    c:\Program Files\PostgreSQL\9.6\bin>pg_basebackup -h 10.0.102.246 -p 5432 -D "c:\Program Files\PostgreSQL\9.6\data" -U postgres -v -P --xlog-method=stream
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    transaction log start point: 0/2000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    21961/21961 kB (100%), 1/1 tablespace
    transaction log end point: 0/20000F8
    pg_basebackup: waiting for background process to finish streaming ...
    pg_basebackup: base backup completed

Modify postgresql.conf

Use the following command to modify your postgresql.conf file:
hot_standby = on

Create the Recovery Configuration File recovery.conf

  • You can get a sample file from: c:\Program Files\PostgreSQL\9.6\share\recovery.conf.sample
    OR
  • Create a new empty file recovery.conf: c:\Program Files\PostgreSQL\9.6\data\recovery.conf

Specify the master server host with credentials in the parameter primary_conninfo, using the following command example:
standby_mode = on
primary_conninfo = 'host=10.0.102.246 port=5432 user=postgres password=XXXXXX'

Start the Standby Server

  1. Start the Standby Server, using the following command:
    pg_ctl start -D "c:\Program Files\PostgreSQL\9.6\data\"
  2. Review your log file 'postgres.log,' as seen in the example below:
    2018-07-04 15:00:17 +07 [23900]: [1-1] user=,db=,app=,client= LOG: database system was interrupted; last known up at 2018-07-04 14:50:54 +07
    2018-07-04 15:00:17 +07 [23900]: [2-1] user=,db=,app=,client= LOG: entering standby mode
    2018-07-04 15:00:17 +07 [23900]: [3-1] user=,db=,app=,client= LOG: redo starts at 0/2000028
    2018-07-04 15:00:17 +07 [23900]: [4-1] user=,db=,app=,client= LOG: consistent recovery state reached at 0/20000F8
    2018-07-04 15:00:17 +07 [23900]: [5-1] user=,db=,app=,client= LOG: invalid record length at 0/3000060: wanted 24, got 0
    2018-07-04 15:00:17 +07 [15016]: [3-1] user=,db=,app=,client= LOG: database system is ready to accept read only connections
    2018-07-04 15:00:17 +07 [20720]: [1-1] user=,db=,app=,client= LOG: started streaming WAL from primary at 0/3000000 on timeline 1

 Test the Hot Replication

  1. On the Master Server, create a new table and insert one record, using the following command:
    qtestop=# create table ztmprep (name text);
    CREATE TABLE
    qtestop=# insert into ztmprep(name) values ('test replication');
    INSERT 0 1
  2. Query the record on the Standby Server you just inserted on the Master Server, using the following command:
    qtestop=# select * from ztmprep;
    name
    ------------------
    test replication
    (1 row)
    Note: The Standby Server is read-only, so no DML is allowed here. Therefore an error will be received.
    Example:
    qtestop=# insert into ztmprep(name) values ('test insert in standby server');
    ERROR: cannot execute INSERT in a read-only transaction

Promote Standby Server to Master Server

Use the following command to promote the Standby Server to the Master Server:
pg_ctl promote -D </path/to/data/folder>

Example:
c:\Program Files\PostgreSQL\9.6\bin>pg_ctl promote -D "c:\Program Files\PostgreSQL\9.6\data"
server promoting

Test Promoted Server

Test the Promoted Server with the following command:
testrep=# create table ztmppromote(name text);
CREATE TABLE
testrep=# insert into ztmppromote(name) values('test insert in promoted server');
INSERT 0 1

Update Your qTest Application Configuration to Use the Standby Database

Version 8.7.3 and 9.0.0

  1. Open your qtest.config file located at qtestctl/qtest.config and edit the Postgres Host value in the Postgress section using the following command:
    ...
    /* external tool configurations. these are used as shared configurations to construct applications below. */
    external {
    postgres {
    host = 'master_ip' => 'standby_ip'
    port = 5432
    auth {
    user = 'postgres'
    pass = 'root'
    }
    db {
    qtest = 'qtest'
    sessions = 'sessions'
    }
    }
    ...
    Example:
    ...
    /* external tool configurations. these are used as shared configurations to construct applications below. */
    external {
    postgres {
    host = '10.0.102.57'
    port = 5432
    auth {
    user = 'postgres'
    pass = 'root'
    }
    db {
    qtest = 'qtest'
    sessions = 'sessions'
    }
    }
    ...
  2. Restart the qtest service after you update the IP.

Version 9.1.5

  1. Execute the qtestctl command to open the configuration wizard then update your Postgres host.

  2. Restart the qtest service.
    Note: Read these instructions for a refresher on using the Command Line Wizard to update your Postgres Host. 

Powered by Zendesk