From Sundials to Chronometers: The Shift to Multi-Master

Making the move from a traditional single-master database to a multi-master system is like trading in your sundial for a marine chronometer. A sundial is simple, reliable… and completely dependent on a single source of truth (the sun overhead). It works, but only if conditions are perfect and you're standing in the same place. A chronometer, on the other hand, lets you navigate the open seas, across longitudes, giving you freedom you never had before, but it demands precision, discipline, and an entirely new way of thinking about time.

The same is true when moving from a single-master (single-write) database to a multi-master (also known as active-active) database. You gain the benefits of global availability, reduced latency, and higher resilience, but you are also now dealing with changes happening simultaneously across space ( on distributed nodes). To take complete advantage of a multi-master replication cluster, you may want to make some fundamental changes to your original underlying schema design, application distribution, and possibly data residency.

Switching to a multi-master architecture isn’t just about changing how you replicate data, it’s about unlocking a fundamentally new capability for your applications. This blog outlines the limitations of a centralized system and how to move to a globally distributed, highly available writes and reads system.

Multi-Master vs Single-Master: The Big Picture

In a single-master database, there is only ever one source of truth at any given moment. Every INSERT, UPDATE, or DELETE happens in a predictable, linear timeline. With multi-master replication, you are working with multiple timelines converging, synchronizing, and sometimes even colliding.

This post covers a few of the most immediate and major considerations to help get you started.

Why Every Table Needs a Primary Key In Multi-Master Replication

Primary Keys aren’t just best practice, they are essential.  In a multi-master system, replication depends on knowing exactly which row is which, even when written to different nodes at the same time. Without a primary key, your replication system has no guaranteed way of identifying and synchronizing rows correctly.

Every replicated table should have a unique primary key. If you're missing them, you’re going to want to add them. BUT!  Read on before you do.

Uniqueness Across Space: Generating Unique Primary Key IDs Safely

In a single-master system, simple auto-incremented sequences or UUIDs often suffice. On a multi-master cluster, they can become dangerous. Imagine two nodes both inserting new rows at exactly the same time — using the same sequence starting at 1. You'll end up with duplicate primary keys and immediate replication conflicts.

Here’s a hypothetical scenario:

A replicated table of “customers” currently has 200 rows in it, with primary key IDs from 1 - 200.

  • At 12:00 PM, Node A in Germany creates a new customer row locally, with the next auto-assigned ID of 201, with a customer name of “John Smith”

  • Simultaneously, Node B in California also creates a new row locally, also with the ID of 201, for a customer with the name of “Jane Doe”

Both new rows are then replicated to each other.  Each node receives the other row, but they both have the same primary key of 201. So, is 201 meant to be John Smith, or Jane Doe?

The solution is to use an ID generation strategy designed for distributed systems.  There are a few ways this could be approached, such as using node-specific ID ranges or UUIDs (but be cautious of index bloat).  For pgEdge, we recommend using Snowflake sequences.  A Snowflake sequence is a globally unique, time-ordered ID that is unique across nodes. You can read more about them in the pgEdge documentation.

Snowflake sequences are composite values that let you:

  • add or modify data in different regions while ensuring a unique transaction sequence.

  • preserve unique transaction identifiers without manual/administrative management of a numbering scheme.

  • accurately identify the order in which globally distributed transactions are performed.

pgEdge provides functions to convert your PostgreSQL sequence key field to a Snowflake sequence field. After converting a table to use a Snowflake sequence, old keys remain in the original format, but new keys are a unique composite value that contains information about the row.  For example, in the following query, you can see a mix of PostgreSQL-style sequences and Snowflake sequences:

acctg=# SELECT id, snowflake.format(id), customer, invoice FROM orders;
        id         |                          format                           |  invoice  
-------------------+-----------------------------------------------------------+------------
                 1 | {"id": 1, "ts": "2022-12-31 19:00:00-05", "count": 0}     | art_9338
                 2 | {"id": 2, "ts": "2022-12-31 19:00:00-05", "count": 0}     | math_9663
                 3 | {"id": 3, "ts": "2022-12-31 19:00:00-05", "count": 0}     | sci_2009
                 4 | {"id": 4, "ts": "2022-12-31 19:00:00-05", "count": 0}     | sci_7399
                 5 | {"id": 5, "ts": "2022-12-31 19:00:00-05", "count": 0}     | art_9484
                 6 | {"id": 6, "ts": "2022-12-31 19:00:00-05", "count": 0}     | music_1849
                 7 | {"id": 7, "ts": "2022-12-31 19:00:00-05", "count": 0}     | hist_2983
135824181823537153 | {"id": 1, "ts": "2024-01-10 14:16:48.438-05", "count": 0} | math_8330
135824609030176769 | {"id": 1, "ts": "2024-01-10 14:18:30.292-05", "count": 0} | art_9447
(9 rows)

In the first column, you can see that the id assigned to each new row changes from a simple value to a more complex Snowflake sequence after the first seven rows - that change indicates the point at which the table was converted to use a Snowflake sequence for its primary key.

Since a Snowflake sequence is a composite value, it provides a bonus; you can use a Snowflake function to extrapolate information from each unique id; for example:

acctg=# SELECT * FROM snowflake.format(136169504773242881);
format----------------------------------------------------------- 
{"id": 1, "ts": "2024-01-11 13:08:59.845-05", "count": 0}(1 row)

Multi-Master Conflict Management

What are Conflicts and Why Do They Happen?

In a multi-master system, conflicts arise when two nodes make concurrent changes to the same data before they have a chance to synchronize.  Examples include:

  • UPDATE-UPDATE conflict: Both nodes modifying the same row at even a slightly different time can result in conflict.

  • INSERT-INSERT conflict: Both nodes adding what they think is a "new" row with the same primary key can result in conflict.

  • DELETE-DELETE conflict: An incoming DELETE transaction from a secondary node can cause a conflict during synchronization if the transaction has already been deleted.  

In a single-master system, like a single sundial, the linear timeline is enforced by WRITE transactions happening on a single node in the order in which they occur - conflicts can't happen. In a multi-master system however, like the ships roaming the seas with chronometers that must stay in sync with Greenwich Mean Time, there are now multiple, simultaneous writers that are geographically distributed. This means there is replication lag that must be dealt with.

Replication lag means that writes don’t always arrive at other nodes in the order they happened.

In an active-active system, you are not just dealing with "who wrote what", you are also dealing with “when” and “where”. Due to network latency and load, writes made earlier on one node can arrive later on another node compared to locally-generated writes.

This is why multi-master systems, including pgEdge, utilize timestamp-based conflict resolution, not just arrival order. However, remember that timestamps are written locally, and therefore rely on the system clock of the local node.  If the clock is not synchronized across nodes, the timestamps themselves can be misleading.

The problem: Writes can arrive out-of-order due to network latency.

The impact: A node might apply its own write, then later receive an earlier (but lagged) write from another node.

The solution: pgEdge includes a  Logical Clock to establish a consistent, monotonic timeline across all nodes — so the true order of writes is respected, even if they arrive late.

design_considerationTimeline:

Node A:  |--[ Write X @ T1 ]--------------|
                                       ↓
Node B:  |--[ Write Y @ T2 ]--|   | (Replication Lag)
                                       ↓
         |-- Apply Write Y ---|-- Apply Write X arrives late

Without timestamps: Write X might incorrectly overwrite Write Y.

With timestamps: Write Y wins, as it happened later.

Approaches to Conflict Management

Resolution: Handling Conflicts After They Happen

The most common approach is to detect and resolve conflicts when synchronizing, through the use of accurate timestamps

  • Last-Update-Wins: The change with the latest timestamp wins.

  • Insert-Insert conflict resolution: When two inserts collide, the one with the latest timetamp is converted into a full-row update, ensuring no data is lost.

This makes accurate clocks critical. pgEdge ensures this by maintaining a monotonically increasing logical clock on each node, preventing clock drift from causing inconsistent conflict resolution.

Avoidance: Designing to Prevent Conflicts

A more elegant approach is, wherever possible, to avoid conflicts entirely through smart data modeling. Two of the main approaches here are:

  • CRDTs (Conflict-Free Replicated Data Types): Data structures designed to automatically merge without conflict.

  • Immutable Data Patterns: Prefer insert-only or append-only models where possible.

Summed-Value Fields Need Special Handling - The Delta_Apply CRDT 

In pgEdge, one of the most practical tools for multi-master conflict avoidance is the delta_apply mechanism. Instead of sending the final value, pgEdge can replicate the change itself (the delta), allowing each node to apply the adjustment rather than overwrite the value.

Think about values that are naturally summed over time:

  • Bank account balances

  • Inventory quantities

  • Game scores

If Node A and Node B both adjust the same field concurrently, which value should win? If you rely on simple "last write wins" logic, you'll lose data.

By replicating deltas instead of full values for numeric fields, you remove the possibility of overwriting concurrent increments or decrements. This is especially important for fields like balances, counters, and inventory levels.

ALTER TABLE accounts ALTER COLUMN balance SET (log_old_value=true, delta_apply_function=spock.delta_apply);

This ensures that all concurrent adjustments are merged correctly, rather than just overwritten. This one function can handle all numeric column types, without requiring any additional schema changes.

Why delta_apply Matters: A Concrete Example

Imagine you are managing a simple bank account balance replicated across two nodes (Node A and Node B). The account starts with a balance of $100.

Without delta_apply:

ActionNodeOperationNew Local Balance
Step 1Node AWithdraw $30$70
Step 2Node BDeposit $50$150

Each node independently updates its local copy of the balance based on concurrent transactions.

In a multi-master system, as replication occurs each node will try to overwrite the other node’s balance with its own final value:

  • Node A will send: balance = $70

  • Node B will send: balance = $150

Depending on conflict resolution (typically Last-Update-Wins), you might end up with either $70 or $150, but the correct value should have been: $100 - $30 + $50 = $120

design_consideration2

With delta_apply enabled:

ActionNodeOperationChange (delta)
Step 1Node AWithdraw $30-30
Step 2Node BDeposit $50+50

When the system replicates the transactions, it no longer tries to send full values. Instead, it sends deltas:

  • Node A sends: delta = -30

  • Node B sends: delta = +50

The receiving nodes will now apply both changes, no matter the order:

$100 - $30 + $50 = $120

design_consideration3This is why delta_apply is essential for fields that store a summed value. It preserves correctness even when updates happen concurrently across nodes.

Rethinking Backup & Restore in a Multi-Master World

In a single-master (or primary-replica) database, backup and restore are relatively straightforward:

  • You take a snapshot (base backup + WAL logs) from the primary node.

  • You can restore this snapshot to a new replica or a replacement primary.

But in a multi-master system, things are more complex, because:

  • There is no single "source of truth" — all nodes are simultaneously authoritative.

  • The replication state (including sequence numbers, logical replication positions, and timestamps) is part of the system’s integrity.

  • Restoring from an old backup can cause immediate replication conflicts or inconsistencies if not done carefully.

Why Standard Backups Can Go Wrong

Imagine restoring a node from a snapshot that is 1 hour old:

  • The node will start with stale data and outdated replication state.

  • Upon reconnecting, it may replicate old changes as if they were new, or it may incorrectly attempt to overwrite more recent updates from other nodes.

  • Worse, it can trigger primary key conflicts or timestamp regressions that violate system integrity.

Principles of Multi-Master Backup & Restore

Coordinated Backups

If you're using multi-master replication, your backups should be taken from all nodes (or at least a designated consistent set) at the same logical point in time, not just one node. This ensures you can rebuild the whole cluster without conflicting histories.

Consistent Restore

You cannot restore just one node in isolation and let it rejoin an existing cluster unless you are certain that:

  • the backup is recent enough to be safely replayed.

  • the logical replication state is reconciled with the other nodes.

Node Replacement Strategy

If you lose a single node, it’s generally safer to:

  • Remove the failed node from the cluster.

  • Deploy a fresh node from a recent backup.

  • Let the new node perform a sync from a healthy peer to catch up.

This avoids introducing stale data back into the cluster.

Rethinking Application Connectivity for Multi-Master

When using a traditional single-master replicated database, application connection patterns typically look like this:

  • READ transactions: May go to nearby read replicas for low-latency data access.

  • WRITE transactions: Always go to the single master node — even if that node is far away — introducing unavoidable latency.

In this model, developers often hard-code or configure their applications to direct all write traffic to a specific host (the master) and distribute read-only traffic across replicas.

What changes in Multi-Master?

In a multi-master system like pgEdge:

  • Every node is writable, not just readable.

  • Each node participates equally in accepting writes and replicating them globally.

  • This means you can now:

  • Write locally, significantly reducing round-trip latency.

  • Still read locally, as before.

Why This Matters

If you continue using a single-master connection strategy:

  • You may still be sending writes across the globe unnecessarily.

  • You might not fully benefit from the performance improvements of using local writes.

  • You are underutilizing the core feature of multi-master replication.

Practical Connection Changes You May Need

Topology-Aware Connection Strings

Configure applications to connect to the nearest node (e.g., via DNS, load balancer, or topology-aware connection string). By doing this, your applications gain the benefits of low-latency for both reads and writes.

Connection Pool Adjustments

In some frameworks, connection pools may be tuned under the assumption that writes are slow due to network latency introduced by remote write transactions. You may want to revisit timeouts, pool sizes, and retry logic now that writes can be local and fast.

Multi-Region Application Awareness

In multi-region deployments, you may want each regional deployment of your application to connect to its co-located pgEdge node:

  • Region A app → Region A database node

  • Region B app → Region B database node

Failover Considerations

Since all nodes are writable, application failover logic may also be simplified. Instead of failing over to a remote master, or being stuck waiting for a physical standby to be promoted, you may simply redirect the connection to another nearby writable node.

Example: Before vs After

Single-MasterMulti-Master
Reads → Local replicaReads → Local node
Writes → Remote masterWrites → Local node
Latency sensitiveLatency minimized

By adapting your application’s connection strategy, you unlock one of the biggest practical benefits of pgEdge and multi-master replication: low-latency writes anywhere.

Conclusion: Unlocking the Power of Multi-Master

Switching to a multi-master architecture isn’t just about changing how you replicate data, it’s about unlocking a fundamentally new capability for your applications. You’re moving from the limitations of a centralized system to the freedom of globally distributed, low-latency, highly available writes and reads.

Yes, it requires thoughtful design and a shift in mindset, but that’s exactly what makes it exciting. The payoff is not just theoretical,  it’s practical and immediate:

  • Your applications can be faster.

  • Your systems can be more resilient.

  • Your users can have a better experience, wherever they are.

With the right patterns, such as globally unique keys, conflict management, smart use of delta_apply, coordinated backups, and updated application connections, you are well on your way to taking full advantage of what multi-master makes possible.

Multi-master isn’t just a technical upgrade; it is an architectural unlock. It opens the door to new products, better user experiences, and more robust infrastructure.

If you’re ready to see how pgEdge can help you harness the full potential of multi-master PostgreSQL, check out pgEdge.com to learn more, get started, and explore the possibilities. pgEdge is available as both a self-hosted Enterprise Platform and as a fully managed Cloud Service

On a personal note, if you’re geo-nerd like myself, you may enjoy the book “Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time” which inspired my opening analogy.