Unleashing the Power of PostgreSQL with pgEdge Distributed Multi-Master Replication and Postgres Platform - Part 2
In this blog, we're continuing to explore the power of multi-master replication (MMR) with the pgEdge distributed Postgres platform and its open source extension, Spock. In the first Part 1 of this blog topic, we discussed different replication methods and deployment models for PostgreSQL replication. The blog also discusses the pros and cons of MMR replication, and how pgEdge Distributed Postgres uses the Spock extension to perform conflict management to ensure data integrity while implementing multi-master replication.
https://docs.google.com/document/d/1td8x6h_n9RvNRRm4AVKqVcl67Vc6AwNJdCPN6_lP9ZY/edit?pli=1&tab=t.0
In this blog we'll focus on conflict management. Conflict-free delta-apply columns are a distinguishing feature of the pgEdge Spock extension that provides a definitive way to apply data updates in the correct order, preventing data conflicts and facilitating efficient and accurate replication of incremental changes and aggregate values. Effective conflict avoidance tooling is essential to maintain the integrity of your data and ensure smooth operation in a MMR environment.
Understanding Conflict-Free-Delta-Apply Columns
To recap the issues we discussed in the first blog: a conflict arises in an MMR cluster when the same data is updated or inserted by concurrent connections on multiple distributed nodes. Unlike a single master replication (SMR) cluster, where the master node accepts WRITE
transactions and supporting nodes answer READ
requests, all nodes in a distributed MMR cluster are tasked with handling both WRITE
and READ
operations for improved performance and efficiency.
The improved performance and efficiency of MMR comes with caveats; foresight and planning are your best defense against data integrity issues. For example, the following scenarios can cause a data conflict in a distributed MMR cluster:
Simultaneous updates: when multiple nodes update the same row concurrently with different values.
Simultaneous inserts: when multiple nodes insert different rows using the same primary key or unique constraint, leading to key conflicts.
Simultaneous modifications to the same row: One node deletes a row while another node updates the same row at the same time.
The traditional (and most often used) method of managing conflict resolution is last update wins, where the most recent change overwrites an earlier one. By itself, this approach can lead to data inconsistencies, especially in scenarios involving cumulative operations (where you're maintaining the sum of a column). For example, if multiple transactions acting on the same account add or subtract values from the same numerical field in a different order on different nodes, the available result using the last update win approach may not always reflect the changes accurately.
Spock's delta-apply
mechanism resolves this issue by replicating the delta (change) rather than the final value or the value that results with the last update wins approach. Taking a delta-apply
approach ensures that all concurrent/incremental updates are accurately applied across all nodes. The Spock extension's Github page has the following example that explains the best usage of the delta-apply
feature.
"Suppose that a running bank account sum contains a balance of $1,000. Two transactions "conflict" because they overlap with each from two different multi-master nodes. Transaction A is a $1,000 withdrawal from the account. Transaction B is also a $1,000 withdrawal from the account. The correct balance is $-1,000. Our Delta-Apply algorithm fixes this problem and highly conflicting workloads with this scenario (like a tpc-c like benchmark) now run correctly at lightning speeds."
To implement the conflict-free-delta-apply
column, the Spock extension applies a small patch to community Postgres. The patch provides the ability to log old values of specified columns when an update is made to that column.
Example
The following example demonstrates testing conflict resolution on a three node pgEdge MMR PostgreSQL cluster. We'll use the pgbench schema to show the effect of the conflict-free-delta-apply
column on the balance column of pgbench_accounts
tables.
#Start by creating the pgbench schema on the test database:
pgbench -i -s 50 -p 5432 testdb
# Use the ALTER TABLE command below to specify the conflict-free-delta-apply column:
testdb=# ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET(log_old_value=true, delta_apply_function=spock.delta_apply);
INFO: DDL statement replicated.
ALTER TABLE
# Update the balance amount to 5000 for account id 6 in branch id 1:
testdb=# SELECT * FROM pgbench_accounts WHERE aid=6 AND bid=1;
aid | bid | abalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
6 | 1 | 5000 |
(1 row)
# Run the following command on the command prompt to generate concurrent updates on all the 3 nodes, updating the balance of the same account in the same branch. The transations will run in parallel causing a conflict on the balance column that will be handled by the delta-apply feature.
psql -p 5432 -U ahsan -d testdb -c "UPDATE pgbench_accounts SET abalance = abalance - 1000 WHERE aid = 6 AND bid=1;" & psql -p 5433 -U ahsan -d testdb -c "UPDATE pgbench_accounts SET abalance = abalance - 500 WHERE aid = 6 AND bid=1;" & psql -p 5434 -U ahsan -d testdb -c "UPDATE pgbench_accounts SET abalance = abalance - 100 WHERE aid = 6 AND bid=1;" &
# Check the balance of the account after running the concurrent updates. The account will have the correct balance after deducting all the sums from the balance. The last update win approach would have yielded an incorrect result.
testdb=# SELECT * FROM pgbench_accounts WHERE aid=6 AND bid=1;
aid | bid | abalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
6 | 1 | 3400 |
(1 row)
Benefits of Conflict-Free-Delta-Apply Columns
The conflict-free-delta-apply
column is a distinguishing benefit of pgEdge's distributed PostgreSQL platform. It is an absolute must if your application uses distributed MMR clusters for use cases that involve maintaining aggregate values, running column sums, or performing incremental changes to numerical columns. Without conflict-free delta-apply, using the last update win approach to resolve conflicts on a distributed cluster can lead to critical calculation errors.
Implementing conflict-free-delta-column
functionality with the Spock extension will make your distributed PostgreSQL database reliable and robust, and ensures data consistency. The Spock extension provides the following key benefits:
Accurate Conflict Resolution: Spock ensures that all incremental changes are preserved and accurately reflected across all nodes.
Enhanced Data Integrity: Spock prevents data loss or inconsistencies that can arise from concurrent updates in multi-master environments.
Improved Performance: Spock reduces the need for complex conflict resolution logic, leading to more efficient replication processes.
Scalability: Spock supports high-concurrency workloads by allowing multiple nodes to process updates simultaneously without compromising data accuracy.
Conclusion
Effective conflict management, like that provided by Spock's conflict-free delta-apply
columns, is crucial for maintaining data integrity in a distributed PostgreSQL MMR environment. Implementing robust conflict resolution strategies and appropriate tooling can help ensure consistent and reliable data replication across your entire cluster.
The pgEdge distributed PostgreSQL platform installs the tooling required to successfully manage a distributed MMR PostgreSQL solution.