Introduction

About a year ago, I blogged about logical replication improvements in PostgreSQL version 16. PostgreSQL 16 was a really good release for logical replication improvements, with performance critical features like parallel apply, providing replication origin for supporting bi-directional replication, and allowing a standby server to be a publisher. Please refer to the old blog post for more details on version 16 replication-related features - you'll find that post at:

https://www.pgedge.com/blog/postgresql-16-logical-replication-improvements-in-action

PostgreSQL 17 also includes a number of significant improvements for logical replication. The enhancements are geared towards improving the usability of logical replication, and meeting high-availability (HA) requirements. In this blog we are going to discuss some of the key logical replication features added to PostgreSQL 17; we won’t be covering all the new features in this blog so there will likely be more than one blog in this series.

I want to thank my PostgreSQL community friends Amit Kapila for introducing me to logical replication features in PostgreSQL 17, and Hayoto Kurado for helping me to understand and test these features.

Synchronizing Slots from Primary to Standby (Failover Slot)

My top pick among the logical replication improvements in version 17 is the failover slot synchronization improvements; this is essentially a high availability feature that allows logical replication to continue working in the event of a primary failover. The feature keeps the replication slot on the primary node synchronized with the designated slots in the standby server. To meet this goal, the server starts slotsync worker(s) on the standby server that ping the primary server at regular intervals for the logical slots information, and updates the local slot if there are changes.

There are two ways to use this feature:

  • The first approach is to enable the sync_replication_slots GUC on the standby node. In this approach, the slotsync worker periodically fetches information and updates locally. Note that if you take this approach, you should not query the pg_sync_replication_slot() function.

  • The other way to use this functionality is to call the pg_sync_replication_slot() function. If you use the function to update your slot, the backend process connects to the primary and performs the update operation once. Note that you cannot call the function if sync_replication_slots is turned on, and the slotsync worker is already periodically refreshing the slots between the standby and primary.

To enable this feature, you need to call the pg_create_logical_replication_slot() function or use the CREATE REPLICATION SLOT ...LOGICAL command on the primary node to configure a replication slot. When configuring the slot, set the failover property for the slot to True.

You also need to set the following parameters to keep the physical standby synchronized with the primary server :

  • standby_slot_names: This parameter holds a list of physical replication slots that logical replication processes will wait for. If a logical replication node is meant to switch to a physical standby after the standby is promoted, the physical replication slot for the standby should be included in the slots listed in this parameter. This ensures that logical replication is not ahead of the physical standby, and this prevents the subscriber from being ahead of the hot_standby when consuming changes from the primary. Some latency can be expected when sending changes from the primary to some of the waiting slots on the standby.

  • sync_replication_slots: This parameter needs to be enabled on the standby server in order to periodically sync the slots between standby and the primary. The slotsync worker periodically fetches information and updates locally.

  • primary_conninfo : You can either set this parameter in the postgresql.conf file or specify it on the command line. Set this parameter on the standby server to specify the connection string of the primary server. For replication slot synchronization, you'll also need to specify a valid database name in the primary_conninfo string. This will only be used for slot synchronization; it is ignored for streaming.

  • primary_slot_name : Specify the name of an existing replication slot to be used when connecting to the sending server via streaming replication. The slot sync worker doesn’t work if this parameter is not set.

  • hot_standby_feedback : This parameter must also be set to on. The parameter specifies whether or not a hot standby will send feedback to the primary or upstream standby about queries currently executing on the standby.

You can use the pg_replication_slots view to review the properties of a replication slot. Those slots with a synced value of True in the pg_replication_slots view can resume logical replication after failover; these slots have been synchronized.

SELECT slot_name, synced FROM pg_replication_slots; 

slot_name | synced 

-----------+-------- 

test      | t

(1 row)

Another important step after failover to a synced slot is to update the connection information to the primary node for each subscriber. Connect to each subscriber, and use the ALTER SUBSCRIPTION command to update the connection information of the new primary.

Failover Slots in Action

In our example, we are going to spin up two instances of PostgreSQL; one instance will be our primary server, and the other will be our standby server. We will call the publisher instance node1, and the standby server node 2 for the purposes of this example. We'll keep the replication slot on the standby server synchronized with the replication slot of the primary so in the event of a failover, the standby will be promoted to primary. After promoting the standby server to primary, any other standby server will need to be updated to connect to the new primary server.

After performing an initdb on node 1, Set the wal_level to logical in the postgresql.conf file and restart the server on node 1.


Use the following command to create a logical replication slot on node 1. When you're passing arguments, keep in mind that the last parameter is failover and it needs to be set to true. Please note that the test_coding contrib module needs to be present before running the command. 


psql -U pgedge -c "SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding', false, false, true);"


slot_name |    lsn    

-----------+----------- 
test      | 0/14D4E08

(1 row)


Now we are going to create the standby server using pg_basebackup;

we'll take a backup of node 1, and include:


-C to create a replication slot.

-S to create a physical slot.

-R to create a standby signal file, with connection information and the primary slot name.


/home/pgedge/pg17/bin/pg_basebackup -d "dbname=postgres user=pgedge port=$port_N1" -D /home/pgedge/pg17/bin/data_N2 -R -X stream -S physical -C


Then, update the following parameters in postgresql.conf on node 2:


sync_replication_slots = on

hot_standby_feedback = on


Run the following statement on node 2 to confirm that a slotsync worker is created on the standby server to continue synchronizing with primary replication slot.
   

SELECT datname, pid, backend_type FROM pg_stat_activity WHERE backend_type = 'slotsync worker';


datname  |  pid   |  backend_type 
  
----------+--------+----------------- 

postgres | 688360 | slotsync worker

(1 row)


Run the following statement on node 2 to confirm a replication slot is present:


SELECT slot_name, plugin, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;


slot_name |    plugin     | restart_lsn | confirmed_flush_lsn 

-----------+---------------+-------------+--------------------- 

test      | test_decoding | 0/2000000   | 

(1 row)


Run the following commands on node 1 to generate WAL records for consumption by the replication slot:


psql -U postgres -p $port_N1 -c "SELECT * FROM pg_switch_wal(); CHECKPOINT; CHECKPOINT;"


psql -U postgres -p $port_N1 -c "SELECT * FROM pg_logical_slot_get_changes('test', NULL, NULL);"


psql -U postgres -p $port_N1 -c "SELECT * FROM pg_logical_slot_get_changes('test', NULL, NULL);"

sleep 1s


pg_switch_wal 

--------------- 

0/3000000

(1 row)


CHECKPOINT

CHECKPOINT 

lsn | xid | data 

-----+-----+------

(0 rows) 


lsn | xid | data
 
-----+-----+------

(0 rows)


Next, we'll query pg_replication_slots() to confirm that the slot is caught up, and that confirmed_flush_lsn is the same on both nodes.


psql -U pgedge -p $port_N1 -c "SELECT slot_name, plugin, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name = 'test';"


psql -U pgedge -p $port_N2 -c "SELECT slot_name, plugin, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;"


slot_name |    plugin     | restart_lsn | confirmed_flush_lsn 

-----------+---------------+-------------+--------------------- 

test      | test_decoding | 0/3000118   | 0/30001C8

(1 row) 


slot_name |    plugin     | restart_lsn | confirmed_flush_lsn 

-----------+---------------+-------------+--------------------- 

test      | test_decoding | 0/3000118   | 0/30001C8

(1 row)

pg_createsubscriber

pg_createsubcriber is an executable included in PostgreSQL 17 that converts a physical standby server into a logical replica. This utility creates a replication setup for each of the databases that are specified in the pg_createsubscriber command. If you specify multiple databases, the utility will create a publisher node and subscriber node for each database, and all the tables within the specified database(s).

When setting up replication, the initial data copy can be a slow process. When you use the pg_createsubscriber utility you can avoid the initial data synchronization, making this ideal for large database systems.

The source server wal_level needs to be set to logical, and max_replication_slots needs to be greater than the number of databases specified in the pg_createsubscriber command. You should review the complete list of Prerequisites and Warnings at the project page before using pg_createsubscriber.

The automated script that follows shows how to use the pg_createsubscriber utility to convert a physical standby server in a logical replication setup. The script will convert a primary and standby server into logical replication setup with publisher and subscriber for each database specified in the command. All the user tables that are part of the primary database will be added to the publication. In the example below, the pgbench tables are included in the publication.

#!/bin/bash


# Start by defining two servers on ports 5432 and 5431, initially the servers are created as primary and standby, but they will be converted to publisher and subscriber with pg_createsubscriber.


port_N1=5432

port_N2=5431


# Stop any servers running on the ports:


/home/pgedge/postgres/pg17/bin/pg_ctl stop -D 

/home/pgedge/postgres/pg17/bin/data_N1

/home/pgedge/postgres/pg17/bin/pg_ctl stop -D 

/home/pgedge/postgres/pg17/bin/data_N2


# Empty the data folders before performing an initdb:

rm -rf data_* *log


/home/pgedge/postgres/pg17/bin/initdb -D 

/home/pgedge/postgres/pg17/bin/data_N1 -U pgedge


# Set the following configuration parameters on the primary node:


cat << EOF >> /home/pgedge/postgres/pg17/bin/data_N1/postgresql.conf

wal_level=logical

max_replication_slots=10

hot_standby=on

port=$port_N1

EOF


/home/pgedge/postgres/pg17/bin/pg_ctl start -D 

/home/pgedge/postgres/pg17/bin/data_N1 -l N1.log


# Create a standby server by taking a backup of the primary server:

# Include the -R option to automatically set up primary_connifo and primary_slotname:

/home/pgedge/postgres/pg17/bin/pg_basebackup -d "dbname=postgres user=pgedge port=$port_N1" -D /home/pgedge/postgres/pg17/bin/data_N2 -R -X stream

# Setup GUCs:cat << EOF >> /home/pgedge/postgres/pg17/bin/data_N2/postgresql.confport=$port_N2

EOF


# Start the standby server:

/home/pgedge/postgres/pg17/bin/pg_ctl start -D 

/home/pgedge/postgres/pg17/bin/data_N2 -l n2.log

sleep 1s


# Populate the primary database with some tables and data:

/home/pgedge/postgres/pg17/bin/pgbench -i -U pgedge -s 10 -d postgres


# Check if the standby is catching up using streaming replication:

/home/pgedge/postgres/pg17/bin/psql -U pgedge -d postgres -p $port_N2 -c "SELECT count(*) FROM pgbench_accounts;"


# Stop the standby server so we can issue the CREATE SUBSCRIBER command:

/home/pgedge/postgres/pg17/bin/pg_ctl stop -D 

/home/pgedge/postgres/pg17/bin/data_N2


# Run pg_createsubscriber, specifying:

# -v for verbose

# -D is target data directory

# -P is the source server 

# -d database included

/home/pgedge/postgres/pg17/bin/pg_createsubscriber -v -D 

/home/pgedge/postgres/pg17/bin/data_N2/ -P "host=localhost" -d postgres --publication=pub1 --subscription=sub

1sleep 1s


# start the subscriber after pg_createsubscriber has run successfully:

/home/pgedge/postgres/pg17/bin/pg_ctl start -D 

/home/pgedge/postgres/pg17/bin/data_N2 -l n2.log


# Confirm the publication and subscription are created on the respective nodes:

/home/pgedge/postgres/pg17/bin/psql -U pgedge -d postgres -p $port_N1 -c "SELECT * FROM pg_publication;"

/home/pgedge/postgres/pg17/bin/psql -U pgedge -d postgres -p $port_N1 -c "SELECT * FROM pg_publication_tables;"

/home/pgedge/postgres/pg17/bin/psql -U pgedge -d postgres -p $port_N2 -c "SELECT * FROM pg_subscription;"

Result of running the above scripts:

# Confirm the publication and subscription are created on the respective nodes:

/home/pgedge/postgres/pg17/bin/psql -U pgedge -d postgres -p $port_N1 -c "SELECT * FROM pg_publication;"

oid  | pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate | pubviaroot

-------+---------+----------+--------------+-----------+-----------+-----------+-------------+------------ 

16406 | pub1    |       10 | t            | t         | t         | t         | t           | f

(1 row)


/home/pgedge/postgres/pg17/bin/psql -U pgedge -d postgres -p $port_N1 -c "SELECT * FROM pg_publication_tables;"

 pubname  |  schemaname  |    tablename   |  attnames  | rowfilter

---------+------------+------------------+----------------------------------+----------- 

pub1    | public     |  pgbench_accounts | {aid,bid,abalance,filler}  

| pub1    | public  | pgbench_branches |{bid,bbalance,filler}     

| pub1    | public     | pgbench_history  | {tid,bid,aid,delta,mtime,filler} 

| pub1    | public     | pgbench_tellers  | {tid,bid,tbalance,filler} |

(4 rows)


/home/pgedge/postgres/pg17/bin/psql -U pgedge -d postgres -p $port_N2 -c "SELECT * FROM pg_subscription;"

oid  | subdbid | subskiplsn | subname | subowner | subenabled | subbinary | substream | subtwophasestate | subdisableonerr | subpasswordrequired | subrunasowner | subfailover |          subconninfo           | subslotname | subsynccommit | subpublications | suborigin

-------+---------+------------+---------+----------+------------+-----------+-----------+------------------+-----------------+---------------------+---------------+-------------+--------------------------------+-------------+---------------+-----------------+----------- 

24576 |       5 | 0/0        | sub1    |       10 | t          | f         | f         | d                | f               | t                   | f             | f           | host=localhost dbname=postgres | sub1        | off           | {pub1}          | any

(1 row)

Conclusion

The demand for distributed PostgreSQL databases by the Enterprise is growing rapidly, and replication is a vital and core part of any distributed system. Starting with PostgreSQL 10, the logical Replication features in PostgreSQL are evolving to become more mature and feature rich with every major release.

pgEdge builds on this strong foundation to provide fully distributed Postgres that delivers multi master capability and the ability to go multi-region and multi-cloud. pgEdge adds essential features such as conflict management, conflict avoidance, automatic DDL replication and more to cater to the demands of always on, always available and always responsive global applications.