Zero-Downtime PostgreSQL Maintenance with pgEdge

PostgreSQL maintenance doesn't have to mean downtime anymore. With pgEdge's zero-downtime node addition, you can perform critical maintenance tasks like version upgrades, hardware replacements, and cluster expansions without interrupting production workloads. Your applications stay online. Your users stay connected. Your business keeps running.

This capability is available across both single-primary deployments (with integration of the open source extension Spock) and globally distributed deployments (by default), giving you the same operational advantages whether you're running a single-region deployment or a globally distributed system.

Now, for an incredibly quick and easy approach to zero-downtime node addition, you can use the pgEdge Postgres Control Plane (hosted on GitHub). This approach provides drastically simplified management and orchestration of Postgres databases using a declarative API, whether you're running a single-primary or a globally distributed deployment of PostgreSQL clusters. Spock and other high-availability components come built-in accompanying community PostgreSQL for quick database administration with simple commands.

And because pgEdge and all associated components are 100% open source under the PostgreSQL license, using 100% core community PostgreSQL, you get to leverage the high-availability components that enable zero-downtime maintenance without vendor lock-in or compatibility concerns.

What Is the Spock Extension?

Spock is pgEdge's advanced logical replication extension for PostgreSQL that enables active-active (multi-master) replication in clusters with row filtering, column projection, conflict handling, and more.

Even though Spock originated from earlier projects like pgLogical and BDR 1, it has seen enormous growth. Our dedicated team of PostgreSQL experts continues to push Spock forward, making it a high-performance, enterprise-grade replication system built for distributed environments.

Zero-Downtime Node Addition: Maintenance Without Interruption

Adding a new node to a live cluster used to force you to choose between downtime and complexity. When using the latest versions of Spock, you no longer need to choose - now, you can get both seamless operation and simplicity.

This feature lets you add a new PostgreSQL node to an existing Spock cluster without any downtime on the origin or existing subscriber nodes. The process creates a temporary replication slot and subscription, clones the origin's state in parallel, and then promotes the new node to a fully active peer once synchronization completes.

Why Zero-Downtime Node Addition Changes Everything

No service interruption. Your cluster stays live throughout the entire process, with no replication pause and no application downtime.

Safe production scaling. Add capacity when you need it without scheduling maintenance windows or warning users about outages.

Minimal manual work. The process runs through standard Spock CLI commands or scripted workflows - no complex coordination needed.

Major PostgreSQL version upgrades with zero downtime. Perform in-place major PostgreSQL version upgrades across your entire cluster without taking anything offline.

How Zero-Downtime Upgrades Work

Regardless of whether you are running a single primary node, or are choosing to run a multi-master deployment, it’s a straightforward process - even more so if you are using the Control Plane to handle deployment and orchestration for you.

First, introduce a new node running the higher PostgreSQL version you want. That node joins the cluster using the zero-downtime addition workflow. Once it's synchronized and active, you remove or replace the older-version nodes one at a time.

The only difference if you are running a single primary deployment is you would need to first enable Spock prior to proceeding with the above steps; afterwards, you would need to either remove Spock or just disable the subscriptions between your old node and the new node.

If you are using the Control Plane, the Spock extension comes already included and enabled in your database server. You would additionally not need to disable the subscription between your old node and your new node as you would when deploying with a single-primary instance; once the database operation is complete, you would simply need to update your configuration to remove the old node.

This rolling upgrade approach means you maintain full read and write availability throughout the entire upgrade process.

The workflow involves a coordinated process to ensure clean, consistent cluster integration:

Initialize subscriptions (excluding the source). The new node configures a disabled subscription to each existing node except the designated source. Each existing node begins buffering new transactions for the new node through a dedicated replication slot.

Ensure transaction synchronization. Before initiating the data copy, the source node fully synchronizes with all in-flight transactions from other nodes. The spock.sync_event() and spock.wait_for_sync_event() functions guarantee this precondition is met.

Copy data from source to new node. The source node establishes a subscription to the new node, synchronizes data and structure, and initiates a snapshot-based copy to bring the new node current to the point of the copy.

Enable remaining subscriptions. Once the copy completes, the new node activates its subscriptions to the other existing nodes. Spock calculates the last commit timestamp for each node as seen via the source and advances each node's replication slot to that timestamp. This prevents duplicate transactions and allows replication to begin at the right point, receiving only new transactions that occurred after the initial copy.

This approach works with no interruption of activity in the existing cluster when running the Control Plane or a distributed cluster with multi-master logical replication across regions. If running just a single primary instance, as noted above, you will need to integrate Spock as a starting step and remove it again at the end of the upgrade process.

Implementation Resources

The Spock documentation provides a complete step-by-step guide to the full process when deploying with an already distributed-enabled cluster.

A video tutorial is coming soon on how to perform this operation from start-to-finish on a single-primary node!

Additionally, when using Control Plane as your method of deployment, documentation is available for how to perform a major upgrade leveraging this zero-downtime add node feature from Spock, or on updating a database outside the context of an upgrade.

The Spock GitHub repository includes several working examples in the samples/Z0DAN directory:

A Python-based orchestration script that runs outside the database and coordinates node addition through external automation tools
A stored procedure version that performs the entire process within PostgreSQL using the dblink extension, offering a fully internal option for controlled or restricted environments

LSN Checkpointing: The Technical Foundation

Spock 5+ includes functions that make seamless node addition possible. LSN checkpointing using spock.sync_event() and spock.wait_for_sync_event() creates a logical checkpoint in the WAL stream on the source node. You can then monitor another node for the arrival of that checkpoint's LSN to ensure all transactions have completed.

When adding a node, you use this to guarantee schema or data changes have fully replicated to your source node before continuing. In the context of zero-downtime node addition, these functions are critical. Once you confirm all in-flight transactions from all nodes have arrived on the designated source node, you initiate the data copy to the new node. Without this precise synchronization, adding a node without interrupting cluster usage wouldn't be possible.

pgEdge Enterprise Postgres and pgEdge Distributed Postgres

These zero-downtime capabilities are available across both pgEdge offerings.

pgEdge Enterprise Postgres gives you production-grade PostgreSQL with built-in support for high availability, advanced backup and restore, monitoring integration, connection pooling, and auditing. It comes with the Spock extension, making it possible to be distributed-ready from day one - just create the extension within the PostgreSQL server itself to enable it. You can start with a standard single-region deployment and seamlessly upgrade to multi-region when your needs evolve.

pgEdge Distributed Postgres takes this further with active-active, multi-master replication across geographic regions. It delivers low-latency access, data residency compliance, and ultra-high availability through its distributed architecture.

pgEdge Control Plane is a distributed application designed to simplify the management and orchestration of Postgres databases. It provides a declarative API for defining, deploying, and updating databases across multiple hosts. It seamlessly handles anything from a single primary database all the way up to a globally deployed active-active (multi-master) cluster with attached read-only replicas.

All three products are 100% open source under the PostgreSQL license. You get all the power of standard PostgreSQL with no proprietary forks, no compatibility issues, and no vendor lock-in. Use pgvector, PostGIS, JSONB, foreign data wrappers, and more - all out of the box.

Our Core Values

pgEdge's commitment to open source is absolute. Everything runs on standard PostgreSQL. The Spock extension is open source. The tools are open source. You're not buying into a proprietary system that traps you.

This matters when you're making infrastructure decisions. You can move between pgEdge Enterprise Postgres and pgEdge Distributed Postgres as your needs change. You can deploy on-premises, in the cloud, or in containers. You can switch hosting providers. You maintain complete control over your data and your deployment.

We implemented a core philosophy in Spock 5+: to eliminate avoidable disruption and make high-availability PostgreSQL truly hands-off at scale.

This takes away a lot of the risk and allows dev teams to upgrade PostgreSQL versions as soon as new releases are available, to expand capacity in response to traffic spikes, or replace hardware without scheduling maintenance windows weeks in advance.

The result?

Reduced operational complexity
Lower risk of human error during maintenance
Improved cluster elasticity and resilience
Real-time distributed applications that stay online and synchronized
True zero-downtime for reads and writes across all nodes in a cluster while expanding or upgrading

Getting Started

Looking for a visual how-to to help you hit the ground running? Our solutions engineer Paul Rothrock created a video on how to use the Zero Downtime Add Node feature in Spock to enable seamless cluster scaling and major version upgrades with no service interruption. You can walk through the process, here.

It’s easy to start using pgEdge Enterprise Postgres and pgEdge Distributed Postgres no matter how you choose to deploy. You can get started as a fully managed SaaS offering through pgEdge Cloud, run on your own infrastructure with VM installations, or deploy to containers in Kubernetes environments.

Both products include 24x7x365 support from PostgreSQL experts who are core contributors to the community. Whether you need standard support or dedicated Forward Deployed Engineer services, you get access to people who know PostgreSQL inside and out.

To learn more about pgEdge's distributed multi-master replication technology and zero-downtime maintenance capabilities, visit the pgEdge website or explore the pgEdge documentation.

Zero-Downtime PostgreSQL Maintenance with pgEdge

What Is the Spock Extension?

Zero-Downtime Node Addition: Maintenance Without Interruption

Why Zero-Downtime Node Addition Changes Everything

How Zero-Downtime Upgrades Work

Implementation Resources

LSN Checkpointing: The Technical Foundation

pgEdge Enterprise Postgres and pgEdge Distributed Postgres

Our Core Values

Getting Started

Antony Pegg

Director, Product Manager

Get started today.

Zero-Downtime PostgreSQL Maintenance with pgEdge

What Is the Spock Extension?

Zero-Downtime Node Addition: Maintenance Without Interruption

Why Zero-Downtime Node Addition Changes Everything

How Zero-Downtime Upgrades Work

Implementation Resources

LSN Checkpointing: The Technical Foundation

pgEdge Enterprise Postgres and pgEdge Distributed Postgres

Our Core Values

Getting Started

Antony Pegg

Director, Product Manager

SUBSCRIBE TO BLOG

Get started today.