How to achieve multi-master replication in PostgreSQL with Spock

Typically, data replication in PostgreSQL is done between an active (primary) database and one or more standby databases. While this is usually enough for many applications and to enable high availability, sometimes you need to replicate your data across more than one active database. With a multi-active database cluster, you can distribute not only your read queries but also your inserts and updates to multiple databases in a cluster. This enables parallel workloads and the possibility of bringing the data closer to the end users, leading to lower latency and modernized, evenly distributed architectures.

PostgreSQL version 9.6, released in 2016, included a community extension called BDR that had some initial bidirectional replication support. The BDR extension was not updated or maintained in subsequent versions of PostgreSQL. Other databases provide support for multi-active clusters, and some products provide support for PostgreSQL, but there has not been a community-licensed, Postgres-native solution for multi-active replication. That changed following the recent launch of pgEdge Distributed PostgreSQL, a fully distributed database optimized for the network edge based on the standard and popular open source PostgreSQL database.

Technical background

Physical replication uses exact block addresses and byte-by-byte replication. This has been commonly used in PostgreSQL for creating a read replica that can be used as a hot standby or an additional read-only database for the application.

By contrast, logical replication involves replicating data objects and their changes by using their primary key. Rather than shipping the write-ahead log (WAL) files for all current states of all objects in the database to an exact matching database in recovery mode, logical replication uses publishers and subscribers to replicate inserts, updates, and deletes on specified objects. As a result, logical replication can be configured to be more finely grained, making it a powerful tool for modern databases.

Why logical replication enables multi-master replication

Logical replication allows you to limit replication to a specific database and provides options for row-level filtering. Logical replication therefore can be configured to replicate from database a to database b, and back from database b to database a. This multidirectional logical replication means that neither database has to be in recovery mode, and writes can happen to each with bidirectional replication between them to keep them in sync.

This means having multiple write endpoints for the application. In addition to providing a multi-master cluster, the version of the database becomes less important, meaning you could have a version 14 database replication and a version 15 database while being able to write to both, reducing downtime.

What Spock brings to the table

pgEdge's Spock extension introduces asynchronous multi-master (active-active) replication with enhanced conflict resolution and conflict avoidance. It also provides better management, monitoring statistics, and integration.

You need conflict resolution when updates are happening on multiple databases at the same time. Updating a row in database a and performing a different update to the same row on database b creates conflict. With Spock, the last update wins, and the row will contain the value of the update from the latest commit without any failures. Spock also provides a resolutions table where conflict resolutions are recorded and can be monitored and analyzed.

Another conflict can arise from updates to an incrementing or sum field. For example, if 5 is added to a field on database a and 10 is added to that same field on database b, using the last-update-wins option would leave a total of plus 5 or 10, rather than the expected plus 15. Spock accounts for this with conflict-free delta-apply columns, altering this column with the delta of the update. The logical replication will ship the delta to the other database, so that the final value of the field in the above example will be the correct plus 15.

Spock also provides support for partitioned tables. Spock allows you to add either the parent table or specific partition tables to replication. This allows for geosharding, where certain partitions can be replicated between countries while other partitions remain only on the original country.

What's next?

Spock is open and pgEdge Community Licensed, which is similar to the Confluent Community License. This license allows unlimited end-user usage, including in production, but prevents third parties from packaging and selling a competitive cloud product.

Spock has many more features on the way. Right now, Spock can recover from intermittent outages: The streaming replication will persist, and the database will catch up and synchronize again. Planned improvements will make it easy to spin up full replacement nodes after a catastrophic node failure with near zero downtime.

Spock is a part of pgEdge Distributed PostgreSQL, available as either a managed database as a service called pgEdge Cloud or the self-hosted pgEdge Platform software.

The code and documentation for Spock can be found in the pgEdge Git repository.

(reposted with permission from Enable Architect, March 30)