An Open Approach to Asynchronous Multi-master Replication With PostgreSQL

Introduction

Thanks to all that joined our Webinar An Open Approach to Asynchronous Multi-master Replication With PostgreSQL! The webinar recording and details can be found here. This blog post covers all the questions from the webinar; if you have further questions, please reach out to pgEdge at: [email protected] or check out our Frequently Asked Questions page.

Installation, Requirements and Versions

pgEdge Platform requires either EL9 (RHEL/CentOS/Rocky), Amazon Linux-2023, or Ubuntu-22.04 and will currently run on PostgreSQL versions 15 and 16. It uses the community version of PostgreSQL with the exception of one small patch that enables the conflict free delta apply feature. pgEdge enhances logical replication through the Spock extension, so while you can compile your own PostgreSQL instance with this patch and extension, pgEdge platform offers you a CLI to easily install and manage PostgreSQL and Spock. As for the license for Spock, we are making all the source code we have developed for pgEdge Distributed PostgreSQL, including Spock, available under our pgEdge Community License.

Functionality

There was a great discussion in the comments about the trade off between synchronous multi-master and asynchronous multi-master. At pgEdge we wanted to focus on low latency between the application or clients and the database, in addition to high availability and partition tolerance. pgEdge nodes are eventually consistent. The time for other databases in your replication cluster to become consistent is dependent on the speed and throughput of the network links relative to the data volume to be replicated. Some other notes on pgEdge and Spock functionality:

Last update wins is an optional configuration for replication on pgEdge platform; you can choose from last update wins, first update wins, keep local, and keep remote.
Non-deterministic functions will make no difference between nodes because the resulting committed data is written out to the wall logs and replicated. Similarly, temporary and unlogged tables will not be replicated because they are not written out to the log file.
There is a function to replicate DDL functions to multiple nodes based on the replication set being used. This will not automatically add or remove items from the replication set and should be used very carefully.
Though the values of sequences can be replicated in pgEdge, it is recommended you do separate sequences on each node, and give them each a different offset to prevent conflicts (for example, n1 starts with 1, n2 starts with 2, n3 starts with 3 and they all increment by 10).

Integration

pgEdge Platform can be installed on any machine, virtual machine, or container running the operating systems listed above. Since the Spock monitoring we talked about is all within tables in the PostgreSQL database, it can easily be added as metrics to Grafana or pgWatch through a custom query against those tables. For pgEdge Cloud, you bring your own account where pgEdge will be installed and configured on virtual machines in that cloud. This allows you to use whatever other cloud services for monitoring and alerting that you’re already using against those virtual machines, the PostgreSQL database, and Spock tables.

We are planning to support both Timescale & Citus. To date we have not done extensive testing, but, these extensions are included in our repository and we would love for an interested party to start testing them together.

Recovery and migration

There were a lot of questions about migrating from a traditional single node database or a database using pgLogical to a pgEdge multi-master cluster. In the coming weeks we’re going to be publishing a blog with information on how to best accomplish this.

With pgEdge Platform there is no centralized service that manages the node, all decisions about conflict resolution happen on the individual node. For cluster management, data validation, and recovery of a node that drops out of the cluster, we have added some really cool features to nodeCtl and Spock. These features will be configurable for Platform but the configuration can be done automatically by pgEdge Cloud. Stay tuned for more details.

Conclusion

Thanks again to everyone that joined and participated in our first webinar. Be sure to check out our quick starts, play around with the product and the other things nodeCtl can help you with and stay tuned for more information on pgEdge Platform and pgEdge Cloud!