pgEdge Distributed PostgreSQL and logical replication provide the power and organization behind a pgEdge replication cluster, allowing you to replicate tables selectively and, on a more granular level, the changes in those tables. Whether you're using pgEdge Distributed PostgreSQL replication for real-time analytics, low latency, or high availability, optimizing replication configuration and query use allows you to optimize for performance, consistency, and reliability.

Postgres replication is a powerful tool for replicating data between databases; unlike physical replication, logical replication gives you more control and flexibility over what data is replicated and how it's used.

This blog explores queries that make it easier to manage logical replication for your PostgreSQL database.

Monitoring Postgres Logical Replication Status

Monitoring the status of your logical replication setup is critical to ensure that your replication is running smoothly. Querying the pg_stat_subscription view can help you monitor the status of all of the subscriptions in your database:

SELECT 
    subname AS subscription_name,
    pid AS process_id,
    usename AS user_name,
    application_name,
    client_addr AS client_address,
    state,
    sync_state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    clock_timestamp() - write_lsn_timestamp AS replication_delay
FROM 
    pg_stat_subscription
ORDER BY 
    subscription_name;


subscription_name | process_id | user_name | application_name | client_address |    state    | sync_state |   sent_lsn   |  write_lsn   |  flush_lsn   |  replay_lsn  | replication_delay
-------------------+------------+-----------+------------------+----------------+-------------+------------+--------------+--------------+--------------+--------------+-------------------
 sub1              |      23456 | postgres  | logical_rep_sub   | 192.168.1.10   | streaming   | synced     | 0/3000128    | 0/3000128    | 0/3000128    | 0/3000128    | 00:00:00.12345
 sub2              |      23478 | postgres  | logical_rep_sub   | 192.168.1.11   | catchup     | async      | 0/4000238    | 0/4000200    | 0/40001F8    | 0/40001E0    | 00:00:02.67890
  • subname: The name of the subscription.

  • state: The state of the subscription process (e.g., streaming, catchup, initializing).

  • sync_state: The synchronization state of the subscription.

  • sent_lsn, write_lsn, flush_lsn, replay_lsn: These columns represent various Log Sequence Numbers (LSNs) that indicate replication progress.

  • replication_delay: The delay between the LSN being written and its application on the subscriber is crucial for identifying lag in replication.


    This query provides a comprehensive overview of the logical replication status, allowing you to identify issues such as replication lag or disconnected subscribers quickly.

Analyzing Postgres Replication Lag

Understanding replication lag is essential in maintaining the consistency and freshness of data across your replicated databases. The pg_replication_slots system view can help you calculate the replication lag between the publisher and subscriber:

SELECT 
    s.slot_name,
    s.active,
    s.restart_lsn,
    pg_wal_lsn_diff(pg_current_wal_lsn(), s.restart_lsn) AS replication_lag_bytes,
    clock_timestamp() - pg_last_xact_replay_timestamp() AS replication_lag_time
FROM 
    pg_replication_slots s
WHERE 
    s.active = true
AND 
    s.plugin = 'pgoutput';


slot_name | active | restart_lsn | replication_lag_bytes | replication_lag_time 
-----------+--------+-------------+-----------------------+-----------------------
 slot1     | t      | 0/3000128   |                 65536 | 00:00:00.12345
 slot2     | t      | 0/4000238   |                131072 | 00:00:02.67890
  • slot_name: The name of the replication slot being used.

  • replication_lag_bytes: The difference in bytes between the current WAL position on the publisher and the last WAL position acknowledged by the subscriber.

  • replication_lag_time: The time difference between the last transaction replayed on the subscriber and the current time.

    This query helps you assess the size and time-based lag in your logical replication, enabling you to take proactive measures if the lag exceeds acceptable thresholds.

Monitoring Replication Slot Usage

Replication slots are critical in logical replication, ensuring that WAL segments are retained until all subscribers process them. You can query the pg_replication_slots view to monitor the use of replication slots:

SELECT
    slot_name,
    plugin,
    slot_type,
    active,
    confirmed_flush_lsn,
    pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS slot_lag_bytes
FROM 
    pg_replication_slots
WHERE 
    slot_type = 'logical';


slot_name | plugin  | slot_type | active | confirmed_flush_lsn | slot_lag_bytes 
-----------+---------+-----------+--------+---------------------+----------------
 slot1     | pgoutput| logical   | t      | 0/3000128           |           65536
 slot2     | pgoutput| logical   | t      | 0/4000238           |          131072
  • slot_name: The name of the replication slot.

  • slot_lag_bytes: The lag in bytes between the current WAL position and the last position is confirmed as flushed by the slot.

Monitoring replication slot usage is crucial for preventing issues related to WAL segment retention, which could potentially lead to disk space exhaustion on the publisher.

Dropping Unused Replication Slots

Over time, you may accumulate unused replication slots, especially after removing subscribers or changing replication configurations. These unused slots can cause unnecessary retention of WAL files, leading to wasted disk space. The following query identifies and drops unused replication slots:

DO $$
DECLARE
    slot_record RECORD;
BEGIN
    FOR slot_record IN
        SELECT slot_name FROM pg_replication_slots WHERE active = false
    LOOP
        EXECUTE format('SELECT pg_drop_replication_slot(%L)', slot_record.slot_name);
    END LOOP;
END $$;

This query iterates over your inactive replication slots and uses the  pg_drop_replication_slot management function to drop them. Regularly cleaning up unused replication slots will ensure that your database remains efficient and prevent potential issues with WAL file retention.

Creating Replication Slots

If you need to create a new logical replication slot, the following query is useful:

SELECT * FROM pg_create_logical_replication_slot('my_slot', 'pgoutput');


slot_name | xlog_position 
-----------+---------------
 my_slot   | 0/3000128

This query uses the pg_create_logical_replication_slot function to create a new logical replication slot with the specified name and output plugin (pgoutput in our example). The query is useful when setting up new logical replication configurations; use it to confirm that the subscriber can start receiving changes from the correct point in the WAL records.

Optimizing Logical Replication with pglogical

If you’re using the pglogical extension for more advanced logical replication capabilities, the following query can help you check the status of all pglogical subscriptions:

SELECT 
    subscription_name,
    status,
    received_lsn,
    replay_lag,
    last_received_change,
    pending_changes
FROM 
    pglogical.show_subscription_status();


subscription_name | status   | received_lsn | replay_lag | last_received_change | pending_changes 
-------------------+----------+--------------+------------+---------------------+-----------------
 sub_pglogical1    | replicating | 0/3000128 | 00:00:01.234 | 2024-08-22 10:30:00 | 5
 sub_pglogical2    | idle      | 0/4000238 | 00:00:00.000 | 2024-08-22 10:29:30 | 0
  • subscription_name: The name of the pglogical subscription.

  • replay_lag: The lag between the last received change and the current time.

  • pending_changes: The number of changes pending to be applied on the subscriber.

This query provides a detailed overview of your pglogical subscriptions, helping you fine-tune replication settings and troubleshoot issues.

Conclusion

pgEdge Distributed PostgreSQL uses logical replication across your cluster, providing greater control and flexibility over precisely what data is replicated and how that data is stored. pgEdge continues to develop versatile tooling that offers fine-grained control over data replication processes. The queries outlined in this blog can help you effectively monitor, manage, and optimize your logical replication clusters. These queries help ensure data consistency, minimize replication lag, and prevent conflicts, all essential for maintaining a robust and reliable database environment.

As you continue to work with logical replication, consider incorporating these queries into your regular monitoring and maintenance routines to ensure your PostgreSQL databases and pgEdge clusters perform optimally.