Multi-Master Replication: Using pgEdge Enterprise Postgres with Spock and CloudNativePG

Tue, 11 Nov 2025 04:52:59 GMT

When I started exploring using CloudNativePG (CNPG) with pgEdge Enterprise Postgres and the Spock extension, I realized there were a few gotchas that weren’t obvious at first. In this blog post, I’ll share my experiences running a pgEdge docker image with Spock inside CloudNativePG, along with the lessons I learned about using superuser access, configuration management, and initialization scripts.In our CNPG cluster, each cluster has 3 Postgres pods inside. CNPG by default enables physical replication inside each CNPG cluster with a Read/Write primary and two read-only replicas. Our two CNPG clusters use Spock to enable bi-directional logical replication between the clusters, but internally also are replicating, allowing us to enable connection pooling to ensure resources are available for queries if needed.The steps that follow demonstrate running a pgEdge Enterprise Postgres docker image with Spock inside CNPG, setting up nodes, and configuring bi-directional replication

1. Installing the CloudNativePG Operator

The CNPG operator manages PostgreSQL clusters in a Kubernetes environment. Install it with the following commands:The helm install command deploys the operator in cnpg-system; kubectl get pods confirms that the pods are running.

2. Deploying PostgreSQL Clusters

In this step, we deploy two clusters (A and B) that we'll configure for bi-directional replication with the Spock extension.Cluster A (cluster-a.yaml)Cluster B (cluster-b.yaml)Next, use the command to create the clusters:Wait until both clusters are healthy before moving on to step 3.

3. Establishing Superuser Access

Spock requires superuser privileges to create nodes and replication sets. In our .yaml file, we created a user named postgres, with superuser privileges (enableSuperuserAccess: true).Security Note: Be sure you disable superuser access after setup by re-invoking the .yaml file with enableSuperuserAccess: false.

4. Creating Spock Nodes

Nodes must exist before you configure replication; Spock will verify node existence before connecting.Cluster A Node (spock-node-a-job.yaml)Cluster B Node (spock-node-b-job.yaml)After creating the .yaml files, use the following commands to create the nodes.

5. Defining Bi-Directional Replication

Spock's bi-directional replication allows each node to act as both a publication node and a subscriber node; the following .yaml files create the bi-directional subscriptions between clusters.Cluster A → B (spock-repl-a-job.yaml)Cluster B → A (spock-repl-b-job.yaml)Use the following commands to execute the .yaml files and establish bi-directional replication between the two nodes:

6. Testing Bi-Directional Replication

The following commands connect to each node with psql and exercise replication to demonstrate that rows added on node 1 are replicated to node 2 and rows added on node 2 are replicated to node 1:In both cases, rows inserted in one node appear in the other, confirming active-active replication.

Key Takeaways

Superuser Access:
Required for node creation and replication setup; remove after setup for security

Declarative Configuration:
CNPG ensures settings persist and prevents manual postgresql.conf changes from being overwritten. If you are trying to change postgresql.conf inside the entrypoint script of docker image, CNPG will override the configuration.

Separate shared_preload_libraries parameter:
CNPG has a separate parameter for postgresql.shared_preload_libraries so don’t modify it in the postgresql.conf file.

Initialization Scripts:
Use postInitApplicationSQL for database-specific extensions.

Node Creation:
Nodes must exist before replication; Spock helps by validating node availability.

Seamless PostgreSQL Major Version Upgrades with CloudNativePG and Spock Logical Replication

Thu, 06 Nov 2025 05:20:21 GMT

One of the persistent challenges with PostgreSQL major version upgrades is maintaining logical replication during the process. The standard pg_upgrade utility doesn't preserve logical replication slots, which typically means tearing down and rebuilding replication configurations. For production environments running multi-cluster topologies, this has always been a significant operational hurdle.I recently conducted an experiment to test whether Spock logical replication could survive a CloudNativePG (CNPG) major version upgrade without manual intervention. While the conventional wisdom holds that pg_upgrade doesn't preserve logical replication slots, two components work in tandem to solve this elegantly: Spock's architecture stores all replication metadata—nodes, subscriptions, and replication sets—in dedicated tables within the spock schema, which survive the upgrade intact as user data. The pgEdge Helm chart's init-spock-job.yaml then reads this preserved metadata and automatically recreates the necessary logical replication slots after the upgrade completes. This combination of persistent metadata and intelligent automation is what makes the entire process seamless.

The Experiment

The test environment consisted of three PostgreSQL clusters running version 16, configured with Spock logical replication between them. The goal was straightforward: upgrade all three clusters to PostgreSQL 17 and verify that logical replication continued functioning without rebuilding subscriptions or replication slots.Test Parameters:

Three CNPG clusters (pgedge-n1, pgedge-n2, pgedge-n3)

Initial version: PostgreSQL 16.10

Target version: PostgreSQL 17.6

Spock logical replication configured between all clusters

One cluster (pgedge-n1) configured with three instances for high availability by default

No manual scaling operations during upgrade

The Helm chart used for this demonstration is available at https://github.com/pgEdge/pgedge-helm.git

Initial State: Three Clusters with Active Replication

Starting with three single-node PostgreSQL 16 clusters, Spock replication was already established. Each cluster could both publish and subscribe to changes from the others.After initialization completed, the pgedge-n1 cluster was already running with three instances (the chart's default configuration for high availability):Creating a test table on pgedge-n1 and verifying replication:Within moments, the change appeared on pgedge-n2:Verification back on pgedge-n2 confirmed multi-directional replication was working:At this stage, all three clusters were synchronizing changes bidirectionally. The real test would come next.

Must Check Before Upgrade: WAL Lag Verification

Before starting any major version upgrade with CNPG and Spock, especially when upgrading to PostgreSQL 18, always ensure all Spock replication slots are fully caught up. PostgreSQL 18 introduces stricter pg_upgrade verification: any negative or high WAL lag in logical replication slots can cause the upgrade to fail.Running the verification check on each cluster:The wal_lag column should show 0 bytes for all slots before proceeding with the upgrade. If you observe negative or high WAL lag values, these must be addressed—either through resyncing or repairing the affected slots—before attempting the upgrade.Important considerations:PostgreSQL 18 Requirement: This verification step is particularly critical for PostgreSQL 18 upgrades, as pg_upgrade now performs stricter checks on replication slot synchronization. Slots that aren't fully caught up will block the upgrade process.Backup First: Always take a physical backup before initiating the upgrade. This provides a rollback path if issues are discovered during or after the upgrade process.Zero Tolerance: Don't proceed with any non-zero WAL lag. Even small amounts of lag can indicate synchronization issues that should be resolved in a controlled manner before the upgrade.

The Upgrade: PostgreSQL 16 to 17

The upgrade was triggered by updating the container image in the CNPG cluster specification. CloudNativePG handles the rest—orchestrating the upgrade process, managing temporary upgrade pods, and ensuring minimal downtime.Notice the pgedge-init-spock job running alongside the upgrade pods. This initialization job is crucial—it recreates Spock replication slots after the upgrade completes, ensuring logical replication can resume immediately.As the upgrade progressed:And finally:

Post-Upgrade Verification

After the upgrade was completed, all three clusters were running PostgreSQL 17.6. The critical question: did logical replication survive?Testing on pgedge-n1:On pgedge-n2:And on pgedge-n3:Confirming replication on pgedge-n2:And pgedge-n1:Logical replication was fully operational. No manual intervention was required.

How Spock Survives the Upgrade

The key to understanding why this works lies in how Spock manages replication metadata. Unlike native PostgreSQL logical replication, which relies entirely on system catalogs and replication slots, Spock stores its configuration in dedicated tables within the spock schema:

spock.node — cluster definitions

spock.subscription — replication subscriptions

spock.replication_set — publication configurations

Additional metadata tables for conflict resolution, progress tracking, and state management

During a pg_upgrade, PostgreSQL preserves user schemas and their data while replacing system binaries and catalogs. Since Spock's metadata lives in user tables, it survives the upgrade intact. The init-spock job that runs after the upgrade reads this metadata and recreates the necessary logical replication slots, allowing replication to resume immediately.This is fundamentally different from trying to preserve native PostgreSQL logical replication through an upgrade, where the replication slot configuration itself is lost.

Practical Implications

This capability has significant implications for production PostgreSQL deployments:Simplified Upgrade Workflows: The init-spock-job.yaml eliminates the need to manually disable replication, perform upgrades in isolation, and rebuild replication configurations afterward. The automation handles slot recreation transparently.Zero-Configuration Slot Recreation: The automated initialization job handles slot recreation based on Spock's stored metadata. Operators don't need to track subscription configurations separately or rebuild them manually post-upgrade.Operational Confidence: Knowing that replication configuration survives version upgrades reduces the risk profile of major version upgrades in complex multi-cluster environments.

Considerations

While this approach works reliably, there are important operational factors to consider:Upgrade Downtime: CNPG's in-place upgrade using pg_upgrade requires downtime. Plan your maintenance windows accordingly and ensure your application can tolerate the interruption.Backup Strategy: It's strongly recommended to take physical backups both before and after the upgrade completes. This provides a rollback path if issues are discovered post-upgrade.Docker image compatibility: Make sure base docker images of current version and upgrade version are compatible. As per CNPG documentation you can't update bullseye image with bookworm image.Spock Extension Compatibility: The Spock extension itself must be compatible with both the source and target PostgreSQL versions. For 16 to 17 upgrades, this is well-supported.Initialization Job Dependency: The init-spock job must run successfully after the upgrade. Monitor this job to ensure slot recreation completes as expected.Physical Replication Independence: CNPG's physical replication (for replicas within a cluster) operates independently of Spock's logical replication. Both can coexist without interference.Testing Recommended: As with any upgrade strategy, thorough testing in a non-production environment remains essential.

pgEdge Posts from Muhammad Aqeel