In my previous blog, we started discussing this topic: https://www.pgedge.com/blog/postgresql-replication-and-upcoming-logical-replication-improvements-in-postgresql-16

I briefly discussed replication methods in PostgreSQL, and provided a summary of some of the key features of logical replication that made it in PostgreSQL 16. In this blog, I will dive deep into a couple of performance features for logical replication, demonstrate the steps for seeing the features in action, and share the results of performance benchmarking.

The blog will focus on the parallel apply and binary copy features in PostgreSQL 16. The parallel apply feature enables the functionality of using parallel background workers at the subscriber node for apply change for large in-progress transactions. The number of parallel workers to use for applying changes from the publisher is max_parallel_apply_workers_per_subscription. The second performance feature is binary copy. This feature allows logical replication to do the initial data copy in binary format. This provides a good performance boost when copying tables with binary columns.     


Parallel Apply

Parallel apply is a performance feature that provides performance benefits for replicating large in-progress transactions. To achieve this, we start the changes streaming to the subscriber node, and then use parallel background workers at the subscriber node to apply the changes while they are being streamed from the publisher. You can configure the number of parallel workers to use at the subscriber node for applying the changes with the max_parallel_apply_workers_per_subscriptionconfiguration parameter.

The example below demonstrates how to use this exciting logical replication feature.  We've also provided sample performance numbers taken while running a test with a couple of AWS instances in different regions.

For this example, I have the publisher running on AWS us-east-1 and subscriber node running on AWS us-west-2.

 

Publisher

To configure the publisher node, connect to the node and:

1. Create a fresh PostgreSQL cluster with initdb and set the following configuration parameters. Specify values that work well with your server specification:  

shared_buffers=2GB
wal_level = logical
client_min_messages = warning
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off
logical_decoding_work_mem = 64MB

2. Create a table for publication; we've used the following command:

CREATE TABLE large_test (id int primary key, num1 bigint, num2 double precision, num3 double precision );

3. Create a publication FOR ALL TABLES; you can optionally create a publication for just the large_test table created in the previous step:

CREATE PUBLICATION pub FOR ALL TABLES

Subscriber

To configure the subscriber node, connect to the node and:

1. Create a fresh cluster with initdb and set the following configuration parameters. The parameters need to be set according to your server specification:

shared_buffers=2GB
wal_level = logical
client_min_messages = warning
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off
logical_decoding_work_mem = 64MB
max_parallel_apply_workers_per_subscription = 4

For our test server, I set max_parallel_apply_workers_per_subscription to 4 to spawn four parallel workers for applying changes to the subscriber node.

2. Create a table for publication to receive the replication stream from the publisher:

CREATE TABLE large_test (id int primary key, num1 bigint, num2 double precision, num3 double precision );

3. Create a subscription with connection properties to the publisher:

CREATE SUBSCRIPTION sub CONNECTION 'host=ec2-44-209-69-249.compute-1.amazonaws.com user=admin password=password dbname=demo' PUBLICATION pub WITH (copy_data = off, streaming=parallel, synchronous_commit=remote_apply);

Please note that we are setting the copy_data parameter to off for the purposes of this test so we can stream the table changes instead of doing the initial data copy. We are also setting the streaming type to parallel; this will enable the parallel apply feature and apply the changes to the subscriber node with the specified number of workers.

 

Publisher

To set up our test scenario, we connect to the publisher node and:

1. Set synchronous_standby_names to the name of the subscriber; you don't need to do this to make use of the parallel apply feature; this was only done for the purpose of this test. Setting the parameter ensures that the backend waits for the application on the subscriber node, so we can measure the timing:

cat << EOF >> /opt/pgedge/data/pg16/postgresql.conf
synchronous_standby_names = 'sub'
EOF

2. Restart the PostgreSQL server.

3. Use psql to run the following command. The command starts and times a large transaction on the publisher node:

\timing
EXPLAIN (ANALYZE, BUFFERS) INSERT INTO large_test (id, num1, num2, num3)
        SELECT i, round(random()*10), random(), random()*142
        FROM generate_series(1, 5000000) s(i);

Results

  • With streaming set to parallel, it takes 58887.540 ms (00:58.888) to complete the transaction and apply the changes at the subscriber node.

  • With streaming set to off, it took  106909.268 ms (01:46.909) to complete the transaction and apply the changes at the subscriber node.

This gives us up to 50-60% performance gain for large in-progress transactions using parallel apply.

 

Binary Copy

Binary copy is another performance feature of logical replication added in PostgreSQL 16. The binary copy feature makes it possible to do the initial copy of table data in binary format. Streaming data in binary format was added in previous releases but doing the initial table copy in binary mode wasn’t supported prior to PostgreSQL 16.

I've conducted a test using two AWS instances to demonstrate the performance benefit gained with this feature. The following example shows how to enable this feature and provides the performance numbers of testing the initial data load with binary vs non-binary format.

 

Publisher

To set up our binary copy test scenario, connect to the publisher node and:

1. Set the following configuration parameters to maximize your system performance:

shared_buffers=2GB
wal_level = logical
client_min_messages = warning
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off
logical_decoding_work_mem = 64MB

2. Create a table that includes bytea columns:

CREATE TABLE large_test (id int primary key, num1 bytea, num2 bytea, num3 double precision );

3. Create a publication, specifying the FOR ALL TABLES clause:

CREATE PUBLICATION pub FOR ALL TABLES

4. Add records to the table:

\timing
EXPLAIN (ANALYZE, BUFFERS) INSERT INTO large_test (id, num1, num2, num3)
        SELECT i, md5(round(random()*10)::text)::bytea, md5(random()::text)::bytea, random()*142
        FROM generate_series(1, 50000000) s(i);

5. Check the table size after the initial data load:

SELECT pg_size_pretty (pg_relation_size('large_test'));
 pg_size_pretty 
----------------
 5208 MB
(1 row)

Subscriber

Connect to the subscriber node and:

1. Set the following configuration parameters appropriately for your system:

shared_buffers=2GB
wal_level = logical
client_min_messages = warning
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off
logical_decoding_work_mem = 64MB

2. Create a table with the same bytea columns:

CREATE TABLE large_test (id int primary key, num1 bytea, num2 bytea, num3 double precision );

3. Create the subscription; set the binary parameter to true and the copy_data parameter to on for the initial data transfer.

CREATE SUBSCRIPTION sub CONNECTION 'host=ec2-44-209-69-249.compute-1.amazonaws.com user=admin password=password dbname=demo' PUBLICATION pub WITH (copy_data=on, binary=true);

4. Create the following function to time the initial data copy from publisher to subscriber:

CREATE OR REPLACE PROCEDURE wait_for_rep() LANGUAGE plpgsql AS $$ BEGIN WHILE (SELECT count(*) != 0 FROM pg_subscription_rel WHERE srsubstate <> 'r') LOOP COMMIT; END LOOP; END; $$;

5. Call the function to time the transfer:

\timing
call wait_for_rep();

Results

  • Without binary load (binary set to false), it took 383884.913 ms (06:23.885) to complete the transaction and apply the changes at the subscriber node.

  • With binary load (binary set to true), it took 267149.655 ms (04:27.150) to complete the transaction and apply the changes at the subscriber node.

This provides a 32% performance gain when performing the initial table copy in binary format.

 

Conclusion

The use of distributed PostgreSQL databases is growing rapidly, and replication is a vital and core part of any distributed system. Replication features in PostgreSQL are evolving to become more mature and feature rich with every major release. The groundwork for logical replication was laid prior to PostgreSQL 10, but the logical replication feature itself developed into a usable form in PostgreSQL 10. Since then, replication support has grown tremendously, and the major features added in each release warrant a separate blog post that I will cover in due course. This blog covers new logical replication performance features added in PostgreSQL 16; stay tuned for more blogs discussing the remaining PostgreSQL 16 logical replication features.