Always Online or Bust - Zero Downtime Major Version Postgres Upgrades
The Postgres Conference committee asked me to submit a talk for the 2025 Orlando Event back in November of 2024. Of the two submitted presentations, the committee ultimately selected my talk on a very important topic: major version Postgres upgrades. Major version upgrades are a topic experts revisit fairly frequently for a few reasons, and it makes a good conference staple.
Not everyone can attend these conferences and upgrading Postgres without sacrificing High Availability is a critical and underused technique. In the spirit of that understanding, let’s talk about major version Postgres upgrades and some of the Tricks of the Trade I shared at the Postgres Conference this year.
The Ways of the Ancients
Before we get too far into this, let’s consider the existing options most users already know about. The first of these is the most likely a Postgres novice will consider when the time comes to upgrade. It’s a relatively simple six-step process that goes a little something like this:
Create an instance or node with the new version
Extract global objects with
pg_dumpall --globals-only
For each database, invoke:
pg_dump --format=directory --file=/some/db/dir --jobs=8
Import global objects into the new instance using psql
Restore each database:
pg_restore --jobs=8 /some/db/dir
Profit
From beginning to end, your cluster migration / upgrade might look something like this:
There are a ton of benefits from using this approach. It’s beginner friendly and fairly difficult to do wrong. It tends to work with any version of Postgres because the dump exists in human readable format if something goes awry. You can set up the new instance ahead of time, or even a whole cluster with multiple replicas as seen in our diagram. You end up with fresh new tables with no bloat. Best of all, both the dump and restore steps can leverage parallel jobs! What’s not to love?
Well, there are a few things that could be considered downsides. Dump / restore is based on the idea of a consistent snapshot of the data. So to ensure all of the data from the old system makes it to the new one, you really need to enforce downtime for the entire procedure. Not good! This makes it a fairly bad match for big databases, because they're the databases that will require the longest to dump and restore.
Why? After exporting and importing the data itself, you'll need to rebuild indexes on the new system. As a result, particularly large tables act as an impassible barrier that defines the final duration of the whole upgrade. That much-lauded parallel dump and restore option only addresses each individual action: dumping a single table, importing a single table, creating a single index. Postgres doesn’t launch multiple parallel workers for “chunks” of larger tables, so the only answer is to wait for these to complete. If our tables aren’t aggressively partitioned, we could be in for a slow and painful upgrade.
Lastly, we'll need to do all of this for each individual database in the Postgres instance. Many installations simply use one database, often the default “postgres” database, but multi-tenant systems or clusters which leverage multiple databases don’t have that luxury. These larger and more complicated instances tend to demand a more suitable procedure.
A New Challenger Appears
Postgres 9.0 introduced the pg_upgrade command to address the exact concern for an easier, faster, and more convenient upgrade path. DBAs and users of the era rejoiced, for our prayers were finally answered: no more dump / restore!
The new six-step process went more like this:
Create an instance with the new version
Stop the old database
Execute
pg_upgrade
with old and new binaries, and both data directoriesStart the new database
Run
ANALYZE
orvacuumdb --analyze-in-stages
to generate new statisticsProfit
Nice! The whole process introduces very little downtime—usually less than five minutes total.There’s no more waiting for dump / restore, so it’s a great match for huge multi-terabyte databases. I’ve personally used it on a 56TB database; size hardly affects the process at all. Why? Because it upgrades the entire cluster as-is, about as fast as the filesystem itself can copy files, or even faster if using the --link option which takes advantage of hard links. It even works on replicas, though that step admittedly relies on a relatively obtuse rsync incantation.
Unfortunately not even pg_upgrade
is all sunshine and rainbows. It works by calling on two Postgres binary installations, one for each Postgres version involved in the upgrade. This may not seem like a problem, but some companies build Postgres from source, and if they haven’t considered the need to install multiple versions, they may have no procedure in place to accommodate this requirement.
It’s also an extremely bad match for Postgres running in containers for a similar reason. Most containers are based on images which are stripped down to the bare minimum, and make no accommodation at all for multiple versions. Worse yet, containers are frequently tied to the primary running service, so it’s not even possible to shut down Postgres without also stopping the container. There are workarounds for this, and it’s possible to build specific images to facilitate invoking pg_upgrade
, but it’s usually a bespoke and piecemeal process.
We also can’t forget that rsync is likely involved if the cluster has any replicas. The Postgres documentation provides a sample for this command, but rsync is infamously tricky and one errant slash could spell disaster. Multiple dry-runs and validation tests are the only safe way to do this, along with a healthy dose of automation to avoid mistakes.
There’s also the underlying concern that pg_upgrade
is ultimately a binary upgrade, so the data and catalog entries must maintain a minimal level of compatibility. The pg_upgrade
command will fail if there are any extensions in the old cluster that haven’t been updated for the new one. If the Postgres catalog adds, removes, or changes columns between versions, the changes will also derail the upgrade. There are lots of little gotchas that are difficult to predict, so once again, dry runs are necessary to identify complications beforehand. Abandoned extensions should be dropped prior to the upgrade, views rewritten, functions modified, until the entire procedure succeeds.
And finally, once we’ve completed the upgrade itself and started the new instance, many queries will probably be slow until we rebuild the table statistics. The query planner won’t have an accurate picture of the data so it will often make wildly incorrect assumptions and produce terrible execution plans. Thankfully this particular wart is slated to be fixed in Postgres 18.
Despite this relatively lengthy list of caveats, pg_upgrade
is still the fastest and most frequently recommended way to upgrade Postgres these days. Using pg_upgrade
takes a bit more expertise, but it’s preferred by most DBAs who want to minimize downtime.
But what if there was actually a slightly better technique? Is there a solution similar to dump / restore in simplicity, but with less downtime than even pg_upgrade
can deliver?
I’m glad you asked!
It’s Perfectly Logical
The introduction of Postgres 10 also brought logical replication functionality. As a result, we can consider upgrade paths like this:
Create an instance or node with the new version
Extract global objects with
pg_dumpall --globals-only
Extract database schemas with
pg_dumpall --schema-only
Create a publication for each database
Replay globals and schema definitions in the new instance
Subscribe to each publication in the new instance
Wait for synchronization to complete
Switch to the new instance
Profit
Note the initial similarity between a logical upgrade using these steps and a dump / restore. In fact, if we diagram the procedure, it looks almost identical:
The only real difference is that we’ve replaced the output artifact from
pg_dump
with a direct logical replication link. Neat, eh?
Preparing the Origin
So how to perform a logical Postgres upgrade? There are a few things we need to do to the origin cluster to facilitate the process. The easiest place to begin is changing a few configuration parameters related to logical replication. Here’s the full list:
wal_level = logical
max_replication_slots = 16
max_worker_processes = 64
max_sync_workers_per_subscription = 8
max_logical_replication_workers = 32
The most important update is to set wal_level
to logical so logical replication is actually enabled. Beyond that, it’s often a good idea to increase the number of parallel workers assigned to subscriptions. The default settings are fairly conservative, and we want to get the most out of available bandwidth and CPU cores. Don’t go overboard, and don’t take the above recommendations as gospel; they’re just a good starting point that will help speed up the process.
Once we’ve restarted Postgres to enable the logical WAL level, it’s time to move on to authentication. For the purpose of a major version Postgres upgrade, it’s always a good idea to create a user specifically associated with the migration. Once the process completes, we can remove the user from both clusters without affecting any ongoing system operations.
If the original cluster is version 14 or higher, we can even take advantage of a read-only predefined role and greatly reduce the potential attack-surface of this user like this:
CREATE USER migrator WITH PASSWORD 'secret' REPLICATION;
GRANT pg_read_all_data TO migrator;
Remember, logical replication is a user-level operation on the publication side, so the user must be able to read any table data being transmitted. For Postgres releases prior to 14, this is most easily accomplished with a superuser.
Next it’s important to modify the origin host-based access file (pg_hba.conf) to allow connections from the upgrade target node. Usually it just takes a line like this:
# Type DBs User Host Auth Type
host all migrator new-host scram-sha-256
The “new-host” portion can be a FQDN or an IP-address with netmask, just so long as connections are properly restricted. It’s always good to follow the principle of least privilege when allowing connections to Postgres. Don’t forget to send Postgres a reload signal to activate this change!
The final step is to simply create a publication for each database we intend to be part of the upgrade. The easiest way to do that is by executing this SQL in each database:
CREATE PUBLICATION upgrade_17_pub FOR ALL TABLES;
One publication, all tables. Easy peasy. Keep in mind that this means all tables, including any new ones created after the publication exists. In ideal circumstances, the full upgrade process won’t take more than a day or two, but even then, DDL may slip in a new table unexpectedly. If that happens while a subscriber is consuming the publication, it could cause replication to stop if the table is missing there. Watch those logs!
Preparing the Target
The upgrade needs somewhere to go, doesn’t it? Even if you’re familiar with creating a new Postgres cluster already, we’d like to point out a couple of useful features before you build the new cluster. First, it’s probably a good idea to enable data checksums while the cluster is brand new, saving the necessary downtime of converting an existing cluster. Like so:
initdb --data-checksums -D /db/pgdata/
Now Postgres page-level data checksums are enabled for the entire cluster. Once data is synchronized from the origin cluster, it will automatically inherit this protection layer.
Then we should copy over any relevant configuration parameters from the origin cluster. This could be something as simple as using rsync to copy the entire existing configuration:
rsync old host:/db/oldpgdata/*.conf /db/pgdata
Or you could use a more judicious approach and evaluate all modified parameters in the original cluster and curate the new configuration based on the earlier configuration, adding any new parameters introduced by all subsequent versions.
All that remains is to start the Postgres service, and maybe deploy the same node architecture used in the origin cluster. Our diagram demonstrated three nodes in each cluster—one primary and two replicas. Remember, this new cluster will be a complete replacement for the old one!
Performing the Upgrade
The upgrade itself is actually fairly simple. We need only perform three actions:
Extract the database structure and roles from the origin cluster
Reproduce the roles and DDL on the new cluster
Initiate replication
To that end, here are a couple of simple commands to extract those roles and database DDL:
pg_dumpall --globals-only -h old-host -f roles.sql
pg_dumpall --schema-only -h old-host -f schema.sql
Using pg_dumpall for roles is a pretty old trick, and aligns with the old dump / restore upgrade method. What isn’t so common is using pg_dumpall
to extract database schemas for all databases. This kind of export is pure SQL
and uses CREATE DATABASE
statements to represent the entire origin instance. This DDL export will contain all tables, views, functions, triggers, indexes, and any other database objects. We want this to be an upgrade in all possible ways.
Next we need to replay these two files on the new cluster:
psql -h new-host -U postgres -f roles.sql
psql -h new-host -U postgres -f schema.sql
By provisioning the roles first, all of the objects from the schema file will end up with the same ownership and privileges. While it’s possible to make revisions as part of the migration, it’s best to leave this for a later time to avoid accidentally disrupting the upgrade process.
The last step is to initiate the synchronization between the clusters. Simply execute this statement for each database on the subscribing system:
CREATE SUBSCRIPTION upgrade_17_sub_postgres
CONNECTION 'host=old-host user=migrator dbname=postgres'
PUBLICATION upgrade_17_pub WITH (disable_on_error = true);
By default, the subscription name is used to create a replication slot on the original Postgres node, so it’s important to include the database name or some other kind of unique qualifier. In this example, we simply appended postgres to the upgrade17_sub
name to denote this subscription is for the postgres database.
Note that this command includes a parameter that will cause replication to stop if there are any errors. This is to avoid possible infinite loops during the subscription process, as Postgres will continue to retry to establish the subscription until it succeeds. We don’t want to wake up the next day with an outrageous bandwidth bill from Postgres attempting to copy the entire origin system over and over again, right?
Validating the Upgrade
Postgres does three things when a new subscription is established:
Create a replication slot on the origin as a snapshot.
Initiate a COPY statement for each table in the publication, up to that snapshot.
Transition to logical streaming replication once all initial copies are complete.
Depending on the size and number of tables in our databases, this may take a lot of time. While we’re waiting, it’s a good opportunity to monitor the progress to ensure that replication is continuing smoothly. Keep an eye on the Postgres logs on the new cluster, as they will tend to be very active during the subscription bootstrap phase.
For example, this subscription was temporarily disabled because the new system was missing a table:
2025-03-13 01:28:35.675 UTC [2379528] ERROR: logical replication target relation "public.foo" does not exist
2025-03-13 01:28:35.675 UTC [2379528] CONTEXT: processing remote data for replication origin "pg_16420" during message type "INSERT" in transaction 2054187, finished at 13/8462EB98
2025-03-13 01:28:35.675 UTC [2379528] LOG: subscription "upgrade_17_sub_postgres" has been disabled because of an error
This behavior is exactly what we want. Once alerted, we can fix the issue that stopped replication and then re-enable the subscription:
ALTER SUBSCRIPTION upgrade_17_sub_postgres ENABLE;
We can also check to see how many of those initial COPY statements are still in progress:
SELECT datname, relid::REGCLASS AS table_name, round(100.0 * bytes_processed / bytes_total) AS pct_copied FROM pg_stat_progress_copy;
Results here may be empty because the initial COPY phase is done, or because there was an error. There are also two system catalogs which directly address all subscriptions. The first of these is the pg_stat_subscription
view:
SELECT * FROM pg_stat_subscription;
-[ RECORD 1 ]---------+------------------------------
subid | 16420
subname | upgrade_17_sub_postgres
worker_type | apply
pid | 2383733
leader_pid |
relid |
received_lsn | 13/84790140
last_msg_send_time | 2025-03-13 01:49:41.469987+00
last_msg_receipt_time | 2025-03-13 01:49:41.470027+00
latest_end_lsn | 13/84790140
latest_end_time | 2025-03-13 01:49:41.469987+00
If there is no active PID, replication has been stopped because of some error. This view and example output tell us whether there is an active subscription, and what it’s currently doing. There’s also a statistical summary of the subscriptions in the pg_stat_subscription_stats
view:
SELECT * FROM pg_stat_subscription_stats;
-[ RECORD 1 ]-----+------------------------
subid | 16420
subname | upgrade_17_sub_postgres
apply_error_count | 0
sync_error_count | 0
stats_reset |
The bolded error columns are cumulative, so they may not read as zero if we encounter and correct issues along the way. It's a good sign if readings aren’t increasing as we near the end of the upgrade process.
Lastly, we can check the upgrade from the origin node as well. The pg_replication_slots
view tells us whether or not the subscriber is lagging behind based on current write activity. For example, this query will reveal any lag for each replication slot associated with the upgrade:
SELECT database, slot_name, pg_wal_lsn_diff( confirmed_flush_lsn, pg_current_wal_lsn()) AS lag
FROM pg_replication_slots
WHERE slot_name LIKE 'upgrade_17_sub%';
It’s a good idea to use all of these tools during the initial synchronization and streaming phases of the upgrade. We want to know the replication channel is completely healthy before switching to the new cluster, or if there’s a risk some of our data is missing!
Copy Sequence Values
One complication that is frequently overlooked when handling Postgres logical replication is sequence handling. Almost every Postgres database has sequences, even if every known primary key is a UUID or some other kind of non-sequential integer. Postgres logical replication does not transfer sequence values, so all sequences are effectively reset to 1 on the upgrade target. This is a recipe for disaster if we don’t address it as part of the upgrade procedure.
To avoid that end result, use this SQL on the origin cluster for each database to extract existing sequence values:
COPY (
SELECT format('SELECT setval(%I.%I, %s);',
schemaname, sequencename, last_value + 1000
)
FROM pg_sequences
) TO '/tmp/sequences-postgres.sql';
Execute the resulting SQL files in the corresponding databases in the new cluster to properly set all of the sequences. Note that this example uses a “buffer” of 1000 additional values to account for any stray writes that may bump the sequence value after the extract. Busier systems may need to use a larger value or perform the sequence sync closer to the cluster switchover itself.
The reason we list this step so late in the upgrade process is because it should take place toward the end. Preferably, it should be the last thing we do before flipping the switch and moving traffic to the new cluster. The extract is essentially stateless, so there’s no reason to worry about running it multiple times, but there’s also no need to do this if we just wait.
Switch Write Targets
The last and most notable step in the upgrade is to switch write targets from the origin cluster to the upgraded cluster. This is effectively where the Rubber Meets the Road and all traffic is redirected to the new cluster.
This is easier said than done because enterprise code vaies wildly. Some systems may involve a code deployment because connection endpoints are hard-coded. Bad practice or not, legacy systems are always concerning. Upgrading legacy code could mean moving DNS from the old cluster to the new one, or reconfiguring a load balancer between the two systems. In many cases it’s better to re-assign the old server name entirely, rather than try and track down everything using the old system name.
As you upgrade, don’t forget automated jobs! These can be launched from almost anywhere, and they’re easily lost in the accumulated tech debt for the long existence of the original cluster.
Drop or Disable Subscriptions
Once our upgrade is considered “complete”, we can remove the subscription that did all of the hard work. There are two choices here, depending on the level of paranoia involved. One choice is to disable the subscription with the potential of reactivating it in case a few stray writes made it through a forgotten source after we switched the cluster write targets. In that case, we’d use a statement like this:
ALTER SUBSCRIPTION upgrade_17_sub DISABLE;
For the more stable systems where it’s possible to be 100% certain the old cluster is completely inert, we can drop the subscription instead:
DROP SUBSCRIPTION upgrade_17_sub;
With this last step, we’re really done. The upgrade is complete and there’s no going back. The real beauty of this process is that we could technically go back by replaying the upgrade in reverse, and in effect downgrade from 17 to some older version, perhaps due to some crazy and unforeseen edge case that requires dramatic and invasive application-level changes.
Tricks of the Trade - Advanced Techniques
Adept and experienced readers may have noticed there were a couple of notable techniques omitted from this overview. You’re completely right, and this was entirely intentional. There are at least two other advanced use-cases that could help ease a transition like this.
Easing the Transition
The first of these advanced techniques is to involve PgBouncer from the beginning. PgBouncer is a well-known Postgres pooling proxy that helps to solve the problem of switching write targets. It may be that redirecting all of your write traffic through PgBouncer before starting the migration is ultimately a time-saver.
PgBouncer resolves much of the uncertainty associated with the final upgrade step of redirecting writes from the old cluster to the new one. The secret is the PAUSE and RESUME feature. Consider this diagram:
With PgBouncer at the top of our routing stack acting as an absolute proxy, the final upgrade steps look more like this:
Connect to PgBouncer and tell it to PAUSE traffic. PgBouncer will allow transactions to finish their work and stop assigning client sessions to Postgres backends.
Reconfigure PgBouncer to direct traffic to the new cluster.
Connect to PgBouncer and tell it to RESUME traffic. PgBouncer will acquire backends from the new cluster and assign them to client sessions.
As part of an automated upgrade, this could be done in less than 10 seconds, which is about as close to a zero-downtime upgrade as it’s possible to get when using traditional Postgres tooling.
It’s also pretty easy to do. The top portion of a PgBouncer configuration file usually contains a preamble like this:
[databases]
* = host=old-host port=5432
Normally PgBouncer lets you define aliases which it directs to certain host, database, and port combinations. In the above example, we’re just directing all database names to the old host on the default port. Consider what this implies.
Following our steps, we start by pausing traffic:
psql -U pgbouncer -h old-host -c "PAUSE" pgbouncer
Then we modify the PgBouncer configuration to direct traffic to the new cluster host:
[databases]
* = host=new-host port=5432
This step also likely involves reloading the PgBouncer service, such as with systemd:
sudo systemctl reload pgbouncer
Then we connect to PgBouncer and tell it to resume:
psql -U pgbouncer -h old-host -c "RESUME" pgbouncer
The final command would complete the transition, and thus the upgrade.
It’s not always quite that simple, of course; the hard part of using PgBouncer is integrating it properly. Since it acts as a stateful proxy, it needs to introspect the connection and thus the authentication stack. It’s not trivial to set up the passthrough functions to enable this, though there are guides online and experienced Postgres consultants who can help do it right.
It’s also important to use transaction mode, which is part of how this whole process works. One potential reason companies and developers have avoided PgBouncer is due to prepared statements being unsupported in older versions. The good news is that PgBouncer now supports prepared statements as of version 1.21 (released in October of 2023).
For now it’s one of the best tools for this kind of transition thanks to the PAUSE / RESUME system. For application stacks which are resilient to disconnections, even a simple load balancer would be a suitable replacement. Stop traffic, change configuration, start traffic. Those three steps can make any transition possible so long as there is a single configurable proxy layer.
Taking Smaller Gulps
The second of our advanced techniques is to make the upgrade process a bit more “active”. Relying on a single giant publication containing every table in the database isn't a good idea for users with large databases, databases with multi-TB tables, databases with thousands of tables, or systems on unreliable networks.
We can alleviate almost all of these issues by starting with an “empty” publication containing no tables. Like this:
CREATE PUBLICATION upgrade_17_pub;
Subscribing to the publication now does nothing but establish the replication slot. There is no longer a COPY phase, so the subscription immediately reverts to streaming mode. Once we’ve verified that, we can add tables to the publication any number of ways.
ALTER PUBLICATION upgrade_17_pub ADD TABLE foo;
ALTER PUBLICATION upgrade_17_pub ADD TABLE foo, bar;
ALTER PUBLICATION upgrade_17_pub ADD TABLES IN SCHEMA baz;
We can upgrade one table, many tables, or whole schemas at a time. This curated approach is a bit more hands-on, but it allows us to complete the upgrade at our own pace. We can copy a few tables and perform some validation checks before adding a single huge table and waiting to make sure everything turns out alright. We can continue this process until every table is accounted for, without a single all-or-nothing step for the sake of automation.
Regarding Patroni Clusters
Patroni is currently the most recommended Postgres High Availability system. There are good reasons for this: it’s incredibly aggressive at keeping clusters online, and its quorum-dependent design and built-in node fencing model are great protections against Split-brain scenarios. This direct intervention comes at a cost, however.
Patroni works so well because it takes control over nearly every aspect of a Postgres node cluster. Two of these concern the upgrade process described above: cluster configuration and replication slots.
There are several ways that Patroni can manage Postgres configuration parameters, but the most reliable is through the Distributed Configuration Store (DCS). The easiest (and best) way to interact with this is through the patronictl command-line tool.
Want to view the current configuration?
patronictl show-config
Want to modify it instead?
patronictl edit-config
The second command is the one we’ll need to use in order to integrate the changes necessary for Patroni not to disrupt our upgrade. Remember the Postgres configuration parameters we recommended in the beginning of this guide? Add them to the Patroni DCS in a block like this:
postgresql:
use_slots: true
parameters:
wal_level: logical
max_replication_slots: 16
max_worker_processes: 64
max_sync_workers_per_subscription: 8
max_logical_replication_workers: 32
This way the settings are applied to all nodes, and it will also restart nodes as necessary to maintain high availability in the cluster while also propagating the new settings.
The other thing we need to override is how Patroni handles replication slots. By specifying use_slots
in the Postgres configuration section above, Patroni ensures all replicas have a corresponding replication slot attached to the primary node. At this point, Patroni doesn’t know anything about the logical replication slot associated with our logical subscriptions. In the event of a failover, the new primary will not have the necessary slot and suddenly the subscription must be rebuilt from scratch.
To prevent that from happening, we configure Patroni to manage the logical slot as well; just include a block like this in the DCS configuration for each subscription:
slots:
upgrade_17_sub_postgres:
type: logical
database: postgres
plugin: pgoutput
Now Patroni will occasionally copy the slot and its current position to each replica node. In the event of a failover, the subscriber will connect to the replication slot as it normally would, and logical replication will resume. This works because logical replication keeps track of the last replayed transaction from the upstream system. If the slot is somehow reverted to an earlier state, it will fast-forward to the expected location. This avoids replaying previously handled transactions.
With these two modifications, Patroni becomes an asset to the logical upgrade procedure. Now even a failover in the middle of the upgrade won’t prevent it from succeeding. How’s that for resilient?
The Search for Spock
Consider the pgEdge Distributed Postgres platform for a moment. It consists of several components meant to enable Active-Active distributed clusters, including the Spock, Snowflake, and LOLOR extensions, minor Delta Apply patches, a CLI management tool, and more. The goal of which is to produce self-contained Postgres nodes capable of accepting writes from wherever they originate, yet maintain a cohesive dataset common to every node.
The Spock extension in particular provides full bi-directional Active-Active write support to Postgres clusters from versions 15 to 17. As a result, any of these Postgres versions can coexist in the same pgEdge cluster. In addition to bi-directional replication, Postgres DDL statements are also replayed on all participating nodes. This means no more missing users, tables, or other objects derailing replication; even large object replication is possible within pgEdge clusters.
Consider the implications of this and the upgrade process we outlined here. What does an upgrade look like in a cluster consisting of three Spock-enabled nodes? Say we start with a cluster where all nodes are running Postgres 15 and we want to upgrade to 17.
Add a new Postgres 17 node to the cluster.
Take an old node out of the write pool and remove it from the cluster.
Repeat.
This technique is the only truly zero-downtime upgrade procedure available for Postgres, and it requires bi-directional Active-Active replication.
To be fully transparent, running an Active-Active cluster has its own complications when it comes to managing replication conflicts and other higher-level concepts. We at pgEdge have white papers and presentations on properly adapting to a native Active-Active database environment. It’s not an operational panacea, but it facilitates the highest uptime guarantee possible in the Postgres ecosystem.
If that’s enough to whet your appetite, give us a call and let us show you the way to Ultra HA Postgres clusters and benefits above and beyond major version upgrades.