<?xml version="1.0" encoding="UTF-8" ?>
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
        <channel>
            <title>pgEdge Posts from Ahsan Hadi</title>
            <link>https://www.pgedge.com/blog</link>
            <description>The latest pgEdge Posts from Ahsan Hadi</description>
            <atom:link href="https://www.pgedge.com/feeds/rss/user/ahsan-hadi/postgresql.xml" rel="self" type="application/rss+xml" />
            <language>en-us</language>         
            
            <item>
            <category>PostgreSQL,pgEdge,postgres,PostgreSQL</category>
            <title><![CDATA[pgEdge Vectorizer and RAG Server: Bringing Semantic Search to PostgreSQL (Part 2)]]></title>
            <link>https://www.pgedge.com/blog/pgedge-vectorizer-and-rag-server-bringing-semantic-search-to-postgresql-part-2</link>
            <pubDate>Wed, 15 Apr 2026 06:29:33 GMT</pubDate>
            <description><![CDATA[ <p>In my previous blog, I walked through setting up the pgEdge MCP Server with a distributed PostgreSQL cluster, and connecting Claude to live database data through natural language. In this blog I want to look at a different problem: how do you build AI-powered search over your own content, without adding a separate vector database to your infrastructure?This is where the <a href="https://github.com/pgEdge/pgedge-vectorizer">pgEdge Vectorizer</a> and <a href="https://github.com/pgEdge/pgedge-rag-server">RAG Server</a> come in. Together, they give you a complete open-source Retrieval-Augmented Generation (RAG) pipeline that runs entirely inside PostgreSQL. In this blog, I'll explain what each component does, how they work together, and walk through working examples that you can follow on your own PostgreSQL instance.I am following the same pattern in this blog as I have been doing in my other blogs. The goal is to explain each component and then provide real world working examples in order to the reader to better understand these concepts.Please note: I am using my Rocky Linux VM for this installation and testing and using the Ollama embedding provider (installed on my VM) to generate the embeddings.<h2>Background: The Problem With Keeping Vector Search In Sync</h2>Most teams building AI-powered search hit the same wall. You set up a vector search pipeline, load your documents, generate embeddings, and everything works. Then someone updates a document or adds a new one - suddenly you need a process to detect the change, re-chunk the content, regenerate the embeddings, and update the index. Teams typically solve this with custom scripts, message queues, or external orchestration tools - all of which need to be built, maintained, and monitored separately from the database.The pgEdge Vectorizer eliminates that problem entirely. It runs as a PostgreSQL background worker. Once you enable Vectorizer on a table, it monitors the source data through triggers, chunks and embeds new or modified rows automatically, and keeps the search index in sync without any external orchestration. The same transactional guarantees that PostgreSQL gives you for regular data apply here too.The pgEdge RAG Server sits in front of that data, exposing a simple HTTP API. When a query comes in, it performs a hybrid search that combines vector similarity with BM25 keyword matching to retrieve the most relevant chunks, and passes them to an LLM to generate a grounded answer. The result is accurate, context-aware responses based on your actual data, instead of the model's training set.<h2>How the Pipeline Works</h2>Before getting into setup, it helps to understand how the three components connect:<ul><li>a PostgreSQL background worker extension that monitors source tables, chunks text content, calls your embedding provider (OpenAI, Voyage AI, or local Ollama), and stores the results in an automatically created chunk table. Triggers keep this in sync as data changes.</li></ul><ul><li>the open source PostgreSQL extension that adds the vector data type, HNSW and IVFFlat indexes, and cosine/Euclidean/dot-product similarity operators. The vectorizer uses this to store and index embeddings. It is the foundation that everything else builds on.</li></ul><ul><li>a Go-based API server that handles the retrieval and generation side. It receives a natural language query, embeds it, runs hybrid search against the chunk table, applies a token budget to fit the results into the LLM context window, and calls the LLM to generate a response.</li></ul>It's important to note that you can use SQL functions to enable vectorization on a table and let the background worker handle the rest. There is no need to write embedding logic or manually keep the embeddings in sync with data changes. <h2>Part 1: The pgEdge Vectorizer</h2><h3>What the Vectorizer Does Under the Hood</h3>One of the recurring challenges with AI-powered applications is keeping your vector search index in sync with your source data. Most pipelines require custom scripts or external orchestration tools to detect changes, re-chunk documents, and regenerate embeddings. The pgEdge Vectorizer eliminates that entirely.The Vectorizer runs as a PostgreSQL background worker process. When you call enable_vectorization() on a table column, the extension does three things: it creates a companion chunk table to store the generated embeddings, installs triggers on the source table to detect inserts and updates, and enqueues any existing rows for processing. The background workers then pick up items from the queue, split the text into overlapping chunks, call your configured embedding provider for each chunk, and insert the results into the chunk table. When source data changes, only the affected rows are re-processed, instead of the entire table.This trigger-based approach is what makes it practical for production use. You don't need a separate change data capture system or a scheduled job - the vectorizer is always watching.<h3>Installation</h3>Before installing the vectorizer, make sure pgvector is installed on your PostgreSQL instance — the vectorizer depends on it to store and index the embeddings it generates. If you are running pgEdge Enterprise Postgres, pgvector is already included. For community PostgreSQL, install it from the pgvector GitHub repository or your package manager.You will also need the PostgreSQL server development headers and libcurl. On Rocky Linux / RHEL / Fedora:Clone and build the vectorizer:Please ensure to set your PG_CONFIG parameter before installing the extension.This installation needs to be performed for every pgEdge node in the cluster. I am running two nodes on localhost on my VM,  so I am doing the installation on both nodes.<h3>Configuration</h3>The vectorizer runs as a background worker, so it must be added to  and configured in  before starting PostgreSQL:For this blog I am using Ollama to generate embeddings since everything is running locally on my VM. Ollama is a great option for local development and testing — it runs entirely on your machine with no API keys or external calls required. For production deployments, you would typically switch to OpenAI's text-embedding-3-small or Voyage AI's voyage-3 model, both of which offer higher quality embeddings and better performance at scale. The vectorizer and RAG Server support all three providers, so switching is just a simple configuration change.The  parameter tells the vectorizer which embedding model to use when generating vectors. In this case we are using  — a lightweight 137MB embedding model from Nomic AI that runs efficiently on CPU without requiring a GPU. The  tag ensures Ollama uses the most recent version of the model. This same model name must be consistent across both the vectorizer configuration and the RAG Server pipeline configuration — if the models differ, the query embeddings and the stored document embeddings will be generated by different models, making similarity search unreliable.If you haven't already installed Ollama, run the following. Then restart PostgreSQL to load the background workers : Once PostgreSQL is running, create the extension in your database: <h3>A Practical Example: Product Support Knowledge Base</h3>Let me walk through a realistic example. Say you have a product support knowledge base — a table of articles that your support team maintains. You want users to be able to search it semantically, so that a question like "how do I reset my password?" finds the right article even if the article uses the phrase "account recovery" rather than "reset password."First, create the source table:Enable vectorization on the content column — this single call sets up the chunk table, installs the triggers, and enqueues any existing rows:The vectorizer automatically creates . You can inspect its structure:Insert a few articles and watch the vectorizer process them:Check how many embeddings are still pending:Wait a few seconds for the background workers to process the queue, then check again:<h3>Semantic Search Using generate_embedding()</h3>With embeddings in place, you can run a semantic search directly in SQL. The vectorizer provides a generate_embedding() function that embeds your search query on the fly, so you do not need to pre-compute it in your application.The distance value ranges from 0 (identical meaning) to 2 (completely unrelated). The Account Recovery article scores 0.29 - a very strong semantic match - even though the user asked, "how do I get back into my account?" using entirely different wording from the article's title. That is exactly what semantic search is supposed to do. Another example : <i>The query uses the <=> cosine distance operator from the pgvector extension to find which stored chunks are semantically closest to your question. A few things worth understanding here:</i><ul><li> - the stored 768-dimension vector for each chunk in the chunk table</li></ul><ul><li> - pgvector's cosine distance operator. Returns a value between 0 (identical meaning) and 2 (completely unrelated). Lower is better</li></ul><ul><li> - converts your search query into a vector on the fly by calling the same Ollama embedding model used to generate the stored embeddings</li></ul><ul><li> - labels the score so you can ORDER BY it to get the most relevant results first</li></ul><i>The key point is that the model compares meaning, not keywords. A question like "where is the billing section?" will match the Billing and Invoices article even though it shares almost no words with the article content — because both are semantically about the same topic</i><h3>Automatic Re-Embedding on Update</h3>One of the most practical aspects of the vectorizer is how it handles content changes. When you update an article, the following trigger fires automatically — the old chunks are replaced and new embeddings are generated without any manual intervention:<h2>Part 2: The pgEdge RAG Server</h2><h3>What the RAG Server Does</h3>Direct SQL search with generate_embedding() is useful for development and debugging, but it is not how you expose semantic search to an application. The pgEdge RAG Server is the production-ready layer on top. It exposes a simple HTTP API, handles the embedding of incoming queries, runs hybrid search against the chunk table, manages the token budget for the LLM context window, and returns a generated answer alongside the source chunks used.The hybrid search approach is worth explaining. Pure vector search finds semantically similar content but can miss exact keyword matches. Pure BM25 (keyword) search finds exact matches but misses paraphrases. The RAG Server combines both using Reciprocal Rank Fusion; this retrieves candidates from each method and merges the ranked lists, giving you the benefits of both approaches in a single query.<h3>Prerequisites</h3>The RAG Server is written in Go. You need Go version 1.23 or later:<h3>Installation</h3><h3>Setting Up API Keys</h3>The RAG Server reads API keys from files rather than environment variables — safer for production deployments. Create the key files with restrictive permissions:To use Claude as the LLM for response generation in the RAG Server, you need an Anthropic API key. This is separate from your claude.ai subscription — the RAG Server calls the Anthropic API directly, which is billed by token usage. Head over to <a href="http://console.anthropic.com/">console.anthropic.com</a>, create an account if you don't have one, and generate an API key under Settings > API Keys. You will also need to add a small credit balance under Plans & Billing — $5 is more than enough for testing and development. The RAG queries we are running here cost fractions of a cent each, so your credit will go a long way. Once you have the key, store it in a file and set the correct permissionsIt is worth noting that Claude is not the only option here. The RAG Server supports multiple LLM providers for response generation — you can use OpenAI models like gpt-4o, or run a local model entirely through Ollama, which requires no API key at all.<h3>Configuration</h3>The RAG Server uses a YAML configuration file. A pipeline defines the complete RAG setup, including which database to query, which chunk table to search, and which LLM providers to use for embedding and generation:<h3>Starting and Verifying the Server</h3>Use the following command to start the server and confirm the server health:<h3>Querying the RAG API</h3>The main query endpoint is POST . The following command uses the support knowledge base we set up above:The server retrieves the most relevant chunks from the database, applies the token budget, and sends them to the LLM. The response comes back as JSON:Notice the answer is grounded in your actual knowledge base content — not a generic response from the model's training data. If you add "include_sources": true to the request body, the response will also include the specific chunks that were retrieved, which is useful for building citation-aware interfaces or debugging retrieval quality.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/pgedge-vectorizer-and-rag-server-bringing-semantic-search-to-postgresql-part-2</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>pgEdge,PostgreSQL,pgEdge,postgres,PostgreSQL</category>
            <title><![CDATA[Using the pgEdge MCP Server with a Distributed PostgreSQL Cluster]]></title>
            <link>https://www.pgedge.com/blog/using-the-pgedge-mcp-server-with-a-distributed-postgresql-cluster</link>
            <pubDate>Tue, 07 Apr 2026 06:36:41 GMT</pubDate>
            <description><![CDATA[ <p>I recently wrapped up my blog series covering the exciting new features in PostgreSQL 18 — from Asynchronous I/O and Skip Scan to the powerful RETURNING clause enhancements. If you haven't had a chance to read them yet, head over to <a href="https://www.pgedge.com/blog"><u>pgedge.com/blog</u></a> where you'll also find some great content from my colleagues on how PostgreSQL is embracing the AI revolution.Speaking of the AI revolution — in this blog I want to shift gears and dive into something I've been genuinely excited to explore: using the <a href="https://docs.pgedge.com/pgedge-postgres-mcp-server/v1-0-0/"><u>pgEdge MCP Server</u></a> with a distributed PostgreSQL cluster. I'll explore one of those AI tools firsthand — the pgEdge MCP Server — and specifically what it looks like to connect it to a true distributed PostgreSQL cluster.The Model Context Protocol (MCP) has quickly become the standard way to connect Large Language Models (LLMs) to external data sources and tools. With the release of the pgEdge Agentic AI Toolkit, PostgreSQL developers and DBAs can now connect AI assistants like Claude directly to their databases through the pgEdge Postgres MCP Server.In this blog, I'll focus specifically on what makes using the MCP Server (used with a <a href="https://docs.pgedge.com/"><u>pgEdge Distributed PostgreSQL</u></a> cluster) interesting and different from a single-node setup. I'll walk through the setup, and demonstrate practical examples where the MCP Server combined with a distributed cluster becomes a powerful tool for DBAs and developers alike.<h2>A Quick Overview of the pgEdge MCP Server</h2>The pgEdge Postgres MCP Server is part of the <a href="https://docs.pgedge.com/ai-toolkit/"><u>pgEdge Agentic AI Toolkit</u></a>. It gives AI assistants secure, structured access to your PostgreSQL database - not just raw query execution, but deep schema introspection, performance metrics, and the ability to reason about your data model. Once connected, Claude (or other LLMs) can understand your schema, identify slow queries, inspect index usage, and help you write optimized SQL - all through natural language.The following functionality sets the pgEdge MCP Server apart from other Postgres MCP servers:<ul><li>Full schema introspection</li><li> — primary keys, foreign keys, indexes, constraints, and more.</li></ul><ul><li>Performance metrics</li><li> — exposes pg_stat_statements, index usage, and query performance data.</li></ul><ul><li>Works with any Postgres</li><li> — community Postgres (v14+), pgEdge Distributed Postgres, and more.</li></ul><ul><li>Security built-in</li><li> — TLS support, token auth, HTTP/HTTPS mode. Read-only by default; write access (DDL + DML) can be enabled via the </li><li>allow_writes</li><li> option when needed.</li></ul><h2>Why Distributed Postgres + MCP Is Interesting</h2>A pgEdge Distributed PostgreSQL cluster runs multiple active nodes (multi-master) across regions using the Spock replication extension. Each node can accept both reads and writes simultaneously, making it ideal for globally distributed applications with data sovereignty or low-latency requirements.Connecting the MCP Server to a distributed cluster opens up some compelling use cases that go well beyond what you would do with a single-node setup:<ul><li>Ask Claude to check replication health</li><li> across nodes in natural language, without having to remember Spock-specific system catalog queries</li></ul><ul><li>Investigate node divergence</li><li> by connecting MCP instances to each node and letting Claude compare schemas and row counts</li></ul><ul><li>Query optimization across regions</li><li> — Claude can analyze performance on individual nodes and suggest improvements</li></ul><h2>Setting Up the MCP Server for a Distributed Cluster</h2>The following example demonstrates deploying the MCP server in a distributed cluster with Claude Desktop.  <h3>Step 1: Download and Install the MCP Server</h3>Download the MCP Server archive from the <a href="https://github.com/pgEdge/pgedge-postgres-mcp/releases/"><u>pgEdge GitHub</u></a> releases page; select the binary matching your OS and architecture. If you are running pgEdge Enterprise Postgres, the MCP Server is also available directly from the pgEdge Enterprise repositories, no manual download required. For our example, I am setting up the MCP server on my MAC laptop using the following command to download and connect the MCP server.I have a two-node pgEdge cluster running on my VM and will be using the Claude desktop to connect to these clusters, performing natural language queries. Please make sure the firewall and  entries in the pgEdge cluster are updated to allow access from the MCP server.We don't need to start the MCP server manually - Claude Desktop automatically launches and manages the MCP server process itself using the command and args in your  file. That's why the config file path is specified there.The only thing you need to keep running manually is the SSH tunnel — because that's what gives Claude Desktop's child processes access to your VM.<h3>Step 2: Configure Claude Desktop for Multiple Nodes</h3>This is where things get interesting with a distributed cluster; we'll configure the MCP Server with a separate entry for each node in your cluster. This allows Claude to connect to individual nodes - this is essential for comparing replication status, checking node-specific metrics, or investigating divergence.After saving, restart Claude Desktop. You should see all three database connections available via the hammer icon in the chat interface. You can see pgEdge cluster nodes in the Connectors.<img src="https://a.storyblok.com/f/187930/776x366/0afe0fec89/picture1.png"><h2>Practical Usage Examples</h2>Now let's look at some examples that are uniquely valuable in a distributed cluster context.<h3>Example 1: Checking Replication Health and checking row counts across nodes</h3>One of the most common tasks when managing a distributed cluster is verifying that replication is healthy across all nodes. The Spock extension exposes several system catalog views for this. Instead of remembering the exact query syntax, you can simply ask Claude.The following screen shot shows a few natural language queries executed on the Claude desktop, when connected to pgEdge nodes using the MCP cluster.  The queries provide useful diagnostic information for checking the state of the cluster.<img src="https://a.storyblok.com/f/187930/841x626/69e2870453/picture2.png" >Claude will check the relevant Spock catalog views and provide a clear summary for your review. Under the hood, it translates your question into queries against views like these:<h3>Example 2: Schema Introspection Across the Cluster</h3>In a distributed cluster, DDL changes are replicated via Spock. However, there are scenarios — like a DDL migration that is only partially applied — where schemas can drift between nodes. You can use Claude to quickly validate schema consistency:<img src="https://a.storyblok.com/f/187930/912x659/85704625cd/picture3.png" >Claude inspects both nodes and flags any differences in column types, constraints, or index definitions. This is especially useful after major version upgrades or complex DDL migrations.<h3>Example 3: Query Performance Analysis Per Node</h3>Each node in a distributed cluster may have different query patterns and performance characteristics depending on the geographic region and write/read mix. The MCP Server exposes <a href="https://www.postgresql.org/docs/18/pgstatstatements.html"><u>pg_stat_statements</u></a>, which allows Claude to analyze per-node performance:Claude uses pg_stat_statements to identify slow queries, then examines the actual table structures and existing indexes to suggest improvements. For a distributed cluster, you can run this analysis on each node independently to identify region-specific performance issues.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/using-the-pgedge-mcp-server-with-a-distributed-postgresql-cluster</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>pgEdge,PostgreSQL,PostgreSQL</category>
            <title><![CDATA[PostgreSQL 18 RETURNING Enhancements: A Game Changer for Modern Applications]]></title>
            <link>https://www.pgedge.com/blog/postgresql-18-returning-enhancements-a-game-changer-for-modern-applications</link>
            <pubDate>Tue, 06 Jan 2026 05:45:34 GMT</pubDate>
            <description><![CDATA[ <p>PostgreSQL 18 has arrived with some fantastic improvements, and among them, the RETURNING clause enhancements stand out as a feature that every PostgreSQL developer and DBA should be excited about. In this blog, I'll explore these enhancements, with particular focus on the MERGE RETURNING clause enhancement, and demonstrate how they can simplify your application architecture and improve data tracking capabilities.<h2>Background: The RETURNING Clause Evolution</h2>The RETURNING clause has been a staple of Postgres for years, allowing , , and  operations to return data about the affected rows. This capability eliminates the need for follow-up  queries, reducing round trips to the database and improving performance. However, before Postgres 18, the RETURNING clause had significant limitations that forced developers into workarounds and compromises.In Postgres 17, the community introduced <a href="https://www.postgresql.org/docs/18/sql-merge.html"><u>RETURNING support for MERGE</u></a> statements (commit c649fa24a), which was already a major step forward. MERGE itself had been introduced back in Postgres 15, providing a powerful way to perform conditional , , or  operations in a single statement, but  without RETURNING didn’t provide an easy way to see what you'd accomplished.<h2>What's New in PostgreSQL 18?</h2>Postgres 18 takes the RETURNING clause to the next level by introducing OLD and NEW aliases (commit 80feb727c8), authored by Dean Rasheed and reviewed by Jian He and Jeff Davis. This enhancement fundamentally changes how you can capture data during DML operations.<h3>The Problem Before PostgreSQL 18</h3>Previously, the RETURNING clause had these limitations; despite being syntactically similar, when applied to different query types:<ul><li>INSERT and UPDATE</li><li> could only return new/current values.</li></ul><ul><li>DELETE</li><li> could only return old values</li></ul><ul><li>MERGE</li><li> would return values based on the internal action executed (</li><li>INSERT</li><li>, </li><li>UPDATE</li><li>, or </li><li>DELETE</li><li>).</li></ul>If you needed to compare before-and-after values or track what actually changed during an update, you had limited options; you could:<ul><li>run separate </li><li>SELECT</li><li> queries before the modification.</li></ul><ul><li>implement complex trigger functions.</li></ul><ul><li>use application-level logic to track changes.</li></ul><ul><li>resort to workarounds like checking system columns (e.g., xmax).</li></ul>These approaches added complexity, increased latency, and made your code harder to maintain.<h3>The Solution: OLD and NEW Aliases</h3>Postgres 18 introduces the special aliases  and , which allow you to explicitly access both the previous state and the current state of data within a single statement. This works across all DML operations: , , , and .The syntax is straightforward:You can also rename these aliases to avoid conflicts with existing column names or when working within trigger functions:<h2>MERGE RETURNING: The Complete Picture</h2>The combination of MERGE and RETURNING in Postgres 18 creates an incredibly powerful tool for upsert operations - let me walk you through a practical example that demonstrates the full capabilities.<h3>Practical Example: Product Inventory System</h3>Consider a product inventory management system where you need to sync data from external sources. You want to insert new products, update existing products, and track exactly what happened to each row.First, let's set up our tables:Now, let's populate the tables with some initial data:<h3>Basic MERGE with RETURNING</h3>This is a MERGE operation that shows what action was performed:This query returns:<h3>Advanced MERGE with OLD and NEW</h3>Now let's leverage the OLD and NEW aliases to track detailed changes:This comprehensive query returns:Notice how for the INSERT operation, old values are NULL, while for UPDATE operations, you get complete visibility into what changed.<h3>Building an Audit Trail</h3>One of the most powerful use cases is building comprehensive audit trails without using triggers:-- Create audit table-- Perform MERGE with detailed audit trail-- Results from audit trailThis creates a complete audit trail showing exactly what changed, all in a single atomic operation.<h2>Looking Forward</h2>The RETURNING enhancements in Postgres 18 represent a significant step forward in making Postgres more developer-friendly and reducing the need for complex workarounds. The ability to access both old and new values in a single atomic operation simplifies many common patterns in application development.Some areas where this feature could evolve in future releases:<ul><li>Extended MERGE capabilities that provide more sophisticated MERGE operations with additional WHEN clauses.</li></ul><ul><li>Aggregate support that offers the ability to aggregate RETURNING results directly.</li></ul><ul><li>Cross-table returns that enable returning data from related tables in a single operation.</li></ul><h2>Technical Details and Commit References</h2>For those interested in the technical implementation details:<ul><li>MERGE RETURNING</li><li> (PostgreSQL 17): Commit </li><li>c649fa24a</li><li> by Dean Rasheed</li></ul><ul><li>OLD/NEW Support</li><li> (PostgreSQL 18): Commit </li><li>80feb727c8</li><li> by Dean Rasheed, reviewed by Jian He and Jeff Davis</li></ul><ul><li>Discussion Thread</li><li>: </li><li>https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com</li></ul>The implementation involved changes across multiple components:<ul><li>Executor (execExpr.c, execExprInterp.c, nodeModifyTable.c)</li></ul><ul><li>Parser (parse_target.c)</li></ul><ul><li>Optimizer (createplan.c, setrefs.c, subselect.c)</li></ul><ul><li>Nodes (makefuncs.c, nodeFuncs.c)</li></ul><h2></h2></p> ]]></description>
            <guid>https://www.pgedge.com/blog/postgresql-18-returning-enhancements-a-game-changer-for-modern-applications</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>pgEdge,PostgreSQL,PostgreSQL</category>
            <title><![CDATA[pgEdge Support for Large Object Logical Replication]]></title>
            <link>https://www.pgedge.com/blog/pgedge-support-for-large-object-logical-replication</link>
            <pubDate>Fri, 19 Dec 2025 12:12:26 GMT</pubDate>
            <description><![CDATA[ <p>Replication of large objects isn't currently supported by the community version of PostgreSQL logical replication. If you try to replicate a large object with logical replication, PostgreSQL will return: .  It's a meaningful error (always nice), but not helpful if you have large objects that you need to replicate.pgEdge has developed a 100% open-source extension named LargeObjectLOgicalReplication (lolor) that provides support for replicating large objects. The primary goal of lolor is to provide seamless replication of large objects with pgEdge Spock multi-master distributed replication. (<a href="https://github.com/pgEdge/lolor"><u>Check it out on GitHub!</u></a>)You can access and manipulate large objects in a PostgreSQL database with the following <a href="https://www.postgresql.org/docs/current/lo-interfaces.html"><u>client interface functions</u></a><a href="https://www.postgresql.org/docs/current/lo-interfaces.html">.</a><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul>The pgEdge lolor extension supports the same large objects functions put in place by PostgreSQL, so all of your existing applications that use the previously mentioned functions will continue to work seamlessly.The easiest way to install the lolor extension is by downloading the self-hosted edition or using the fully managed service for <a href="/download"><u>pgEdge Distributed Postgres or pgEdge Enterprise Postgres</u></a>. With either method of deployment, it comes installed by default.You can also use the <a href="https://github.com/pgEdge/control-plane"><u>pgEdge Control Plane</u></a> to get started with either edition of pgEdge Postgres quickly and easily, as it's a 100% open source distributed application that's made to simplify Postgres cluster management and orchestration through a declarative API.However, if you'd prefer to install lolor from source code, you can do so by following the instructions in the <a href="https://github.com/pgEdge/lolor"><u>repository README</u></a> following the same guidelines as other Postgres extensions constructed using PGXS.In this blog, we are going to create a two node pgEdge cluster on the localhost to demonstrate how pgEdge replicates large objects. We'll also share a native PSQL example of using the extension for replicating large objects, and a JDBC example that shows how we can use the extension from a Java program using a JDBC driver.In any directory owned by your non-root user, use the following command to <a href="https://docs.pgedge.com/platform/installing_pgedge/manual"><u>install pgEdge</u></a> on all nodes of the cluster; you'll need to invoke this command on each replication node host:Node 1 setupNavigate into the  directory on node 1 and perform the following steps: Run the following command to set up the pgEdge platform; this command installs PostgreSQL version 17 along with the pgEdge Spock and Snowflake extensions.Then, run the following command to create a Spock node (we are creating a node named ). Note that user named in the command below (in our command ) needs to be an OS user:The next command creates the subscription between  and . You should run this command after completing the initial pgEdge setup on .Then, use the following command to install the lolor extension : Then, source your PostgreSQL installation, connect with PSQL, and run the  statement to create the lolor extension:You'll also need to set the  configuration parameter before using the extension. Set the value to the number that corresponds to the node on which you're setting the parameter; the value can be from 1 to 2^28.Please restart the server after adding the above configuration parameter to the  file. The postgresql.conf file is located in the data directory under your PostgreSQL installation.Before using lolor functionality, you also need to add the large object catalog tables to the  replication set. You can use the following commands:The following commands are executed to enable automatic DDL replication : Node 2 setupNavigate into the  directory on node 2 and perform the following steps to configure the lolor extension:Run the following command to install pgEdge Platform, this will install PG-17, and the pgEdge Spock and Snowflake extensions.Use the following command to create a Spock node. Please note that the user provided in the following command needs to be a OS user:Then, use the following command to create the subscription between  and : Now we are ready to install the lolor extension with the command:Then, log in PSQL and invoke the  statement:You must set  to a number that represents the node in the replication cluster before using lolor. Acceptable values range from 1 to 2^28.Please restart the server after adding the above configuration parameter to After setting the  parameter, use the following commands to add the large object catalog tables to the  replication set:Then, execute the following commands to enable automatic DDL replication:<h2>Example: Using the PSQL Command Line to Exercise lolor</h2>In the sections that follow, we are going to do a short test that demonstrates large object replication using the PSQL client. PSQL is a secure, native PostgreSQL client that uses the libpq driver to negotiate connections.First, we are going to perform the following SQL commands on node 1:We have auto_ddl enabled so the table is also getting replicated to other nodes. We can query node 2 with the following  statement to confirm that the large object was replicated:<h2>Example: Using a JDBC Connection to Query a Large Object</h2>The following program code connects with a pgEdge node and loads  file in the database as a large object and perform retrieval operations. To simplify connection management, you can specify connection information in the app.properties file, and then reference the file in your JDBC connection.app.properties<h3>Getting Started</h3>Large object logical replication is just one benefit you get out-of-the-box when using pgEdge Enterprise Postgres or pgEdge Distributed Postgres to construct, manage, and host your distributed PostgreSQL databases.Want to try out the complete experience for free?  Download the self-hosted edition for bare metal, virtual machine or containers (or check out the fully-managed cloud edition) here: <a href="/download"><u>https://www.pgedge.com/download</u></a><a href="/download"> </a><h3>Additional Resources</h3>lolor documentation is available at: <a href="https://docs.pgedge.com/lolor/"><u>https://docs.pgedge.com/lolor/</u></a>You can also visit pgEdge at Github for more information at: <a href="https://github.com/pgEdge/pgedge"><u>https://github.com/pgEdge/</u></a>For information about the convenience provided by pgEdge Cloud, visit: <a href="https://www.pgedge.com/products/pgedge-cloud"><u>https://www.pgedge.com/products/pgedge-cloud</u></a></p> ]]></description>
            <guid>https://www.pgedge.com/blog/pgedge-support-for-large-object-logical-replication</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>Distributed Postgres,PostgreSQL,PostgreSQL,PostgreSQL High Availability</category>
            <title><![CDATA[Introducing Snowflake Sequences in a Postgres Extension-2]]></title>
            <link>https://www.pgedge.com/blog/introducing-snowflake-sequences-in-a-postgres-extension-2</link>
            <pubDate>Wed, 03 Dec 2025 06:36:54 GMT</pubDate>
            <description><![CDATA[ <p>In a PostgreSQL database, sequences provide a convenient way to generate a unique identifier, and are often used for key generation. From the community, PostgreSQL provides <a href="https://www.postgresql.org/docs/current/functions-sequence.html">functions and SQL language</a> to help manage sequence generation, but the sequences themselves are not without limitations in a multi-master environment. <a href="https://docs.pgedge.com/snowflake/">As a result, we introduced 100% open-source Snowflake sequences</a> that work seamlessly in a multi-master PostgreSQL cluster to remove those limitations and ensure your data can thrive at the network edge. (<a href="https://github.com/pgEdge/snowflake">Check out the extension on GitHub!</a>) The easiest way to get started using Snowflake sequences is to <a href="/download">try out pgEdge Postgres</a>, either the Enterprise or the Distributed edition, available for self-hosting or fully managed hosting on virtual machines, containers, or the cloud. The pgEdge Control Plane (also 100% open source and <a href="https://github.com/pgEdge/control-plane">available on GitHub</a>) makes it even easier to deploy and orchestrate Postgres clusters, and comes with the snowflake extension by default (amongst others). <h2>Why are Sequences an Issue? </h2>In a distributed <a href="/solutions/benefit/multi-master">multi-master database</a> system, sequences can get complicated. Ensuring consistency and uniqueness across the nodes in your cluster is a problem if you use PostgreSQL sequences. The Snowflake extension is designed to step up to automatically mitigate this issue. PostgreSQL sequence values are prepared for assignment in a table in your PostgreSQL database; as each sequence value is used, the next sequence value  is incremented. Changes to the next available sequence value are not replicated to the other nodes in your replication cluster.  In a simple example, you might have a table on node , with 10 rows, each with a primary key that is assigned a sequence value from 1 to 10; the next prepared sequence value  on  will be 11. Rows are replicated from  to  without issue until you add a row on .  The PostgreSQL sequence value table on  has not been incrementing sequence values in step with the sequence value table on n1. When you add a row on , it will try to use the next available sequence value ( will be 1 if you haven't added a row on ), and the  will fail because a row with the primary key value of 1 already exists.  This disorder can be monitored and corrected by manually coordinating the PostgreSQL sequences between nodes in the cluster, but that quickly becomes complicated and potentially impacts the user experience as you add more nodes to the cluster.  <h2>Introducing Snowflake Sequences</h2>An alternative to using PostgreSQL sequences is to use a guaranteed unique Snowflake sequence. Snowflake sequences are represented externally as  values. A Snowflake sequence is made up of:  <ul><li>The timestamp is a 41-bit unsigned value representing millisecond precision and an epoch of 2023-01-01.</li></ul><ul><li>A unique ID is allocated as a 12-bit unsigned value. This provides for 4096 unique IDs per millisecond, or 4 million IDs per second. </li></ul><ul><li>The node number is a 10-bit unique identifier of the PostgreSQL instance in a global cluster. This value must be set with the GUC </li><li> in the </li><li> file.  </li></ul>This combination ensures that a unique identifier is always available; even the most aggressive allocation of sequences cannot exceed the current assignment capabilities. Should it be possible in the future to require more than 4096 Snowflakes per millisecond, the algorithm will bump the timestamp one millisecond into the future to keep Snowflake IDs unique.  You can use <a href="https://docs.pgedge.com/snowflake/snowflake_functions/">Snowflake functions</a> from pgEdge to extrapolate data from a Snowflake sequence for auditing or for use in other transaction processing needs.  Information about the node on which a transaction occurred, or the specific time a transaction occurred is built into your data and easily accessed when you use a Snowflake sequence.  <h2>Using Snowflake Sequences with pgEdge </h2>pgEdge automatically installs and creates the snowflake extension when you install pgEdge Postgres (both the Distributed and the Enterprise packages). It is also automatically installed on all pgEdge Cloud Developer Edition databases.  You can also follow the instructions in the <a href="https://github.com/pgEdge/snowflake">README</a> to build Snowflake from source code. The <a href="https://docs.pgedge.com/platform/pgedge_commands/doc/spock-node-create/">spock node-create</a> command has been updated to set the Snowflake node id for you if you use the node naming convention , ,  (up to ). If you use another node naming convention, you will need to manually set snowflake.node to a unique value:After creating all of your database objects, you can use a pgEdge CLI command to convert your existing sequences into snowflake sequences. When calling the <a href="https://docs.pgedge.com/platform/pgedge_commands/doc/spock-sequence-convert/">spock sequence-convert </a>function, pass the name of the sequence to be converted and the database name. Like other pgEdge CLI functions, you can include quoted wildcards to include all sequences in that database  (‘*’) or to match a specific pattern like all sequences in a schema (‘public.*’).During the conversion, the data type of your sequence fields are changed to bigint; any existing values in your database remain the same.  When you add new rows to your database, the new sequences will use the format of a snowflake sequence:<h2>Getting Started</h2>Snowflake sequences are just one benefit you get out-of-the-box when using pgEdge Enterprise Postgres or pgEdge Distributed Postgres to construct, manage, and host your distributed PostgreSQL databases.Want to try out the complete experience for free?  Download the self-hosted edition for bare metal, virtual machine or containers (or check out the fully-managed cloud edition) here: <a href="/download">https://www.pgedge.com/download </a><h2>Additional Resources</h2>Snowflake documentation is available at: <a href="https://docs.pgedge.com/snowflake/ ">https://docs.pgedge.com/snowflake/ </a>You can also visit pgEdge at Github for more information at: <a href="https://github.com/pgEdge/">https://github.com/pgEdge/</a>For information about the convenience provided by pgEdge Cloud, visit: <a href="/products/pgedge-cloud">https://www.pgedge.com/products/pgedge-cloud</a></p> ]]></description>
            <guid>https://www.pgedge.com/blog/introducing-snowflake-sequences-in-a-postgres-extension-2</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>PostgreSQL,Distributed Postgres</category>
            <title><![CDATA[Postgres 18: Skip Scan -Breaking Free from the Left-Most Index Limitation]]></title>
            <link>https://www.pgedge.com/blog/postgres-18-skip-scan-breaking-free-from-the-left-most-index-limitation</link>
            <pubDate>Tue, 18 Nov 2025 06:35:52 GMT</pubDate>
            <description><![CDATA[ <p>Postgres 18, released on September 25, 2025, introduces an exciting set of performance improvements and new capabilities. Postgres has grown remarkably over the years, and with each major release has become a more robust, reliable, and responsive database for both mission critical and non-mission critical enterprise applications. I’ve been writing about these enhancements since the release, and today I’m diving into two more features that are especially useful in real-world applications. I previously blogged about a major new performance feature, the Asynchronous I/O (AIO) sub-system feature. AIO boosts I/O throughput during sequential scans, bitmap heap scans, and VACUUM operations, providing a performance boost for essentially everyone who uses Postgres. On Linux (with io_uring), this can offer 2–3× performance improvements by overlapping disk access with processing. Please see my blog for more details: <a href="/blog/highlights-of-postgresql-18">https://www.pgedge.com/blog/highlights-of-postgresql-18</a>This week I'll add to my new feature series with a discussion of two more features from the latest release that focus on improving everyday performance and developer efficiency.  Both the enhanced clause and Skip Scan optimization represent the kind of improvements developers use every day—these features make queries faster, code cleaner, and applications simpler without any schema redesign or complex tuning. I have picked these features from among the other exciting features due to their impact on the performance and optimization required by the application developer.<ul><li> clause: Now you can access both </li><li> and </li><li> row values in </li><li>, </li><li>, </li><li>, and </li><li> statements — this is perfect for auditing, API responses, and ETL workflows. This feature reduces round trips, ensures atomicity, and keeps your SQL self-contained and elegant.</li></ul><ul><li>: This optimization allows Postgres to use </li><li> B-tree indexes even when leading columns aren’t filtered, unlocking major performance boosts for real-world analytical queries and reports — all without adding new indexes.</li></ul>Together, these enhancements reflect Postgres 18’s core philosophy, that features should provide smarter performance and simplified development. Postgres's success depends on diligently ensuring that all changes are carefully scrutinized and reviewed before they are added to the project source code. The skip scan feature, developed by Peter Geoghegan (a major Postgres contributor and committer), exemplifies this rigorous development process.<h2>Understanding the Left-Most Index Problem</h2>One of the most anticipated query optimization improvements in Postgres 18 is the B-tree skip scan capability. This feature addresses a long-standing limitation that has frustrated DBAs and developers for years, and demonstrates the Postgres community's continued commitment to making the database more performant and efficient.Before diving into the skip scan feature, let's first understand the limitation it addresses. In previous Postgres versions, multicolumn B-tree indexes were most effective when queries included conditions on the leading columns. The index structure organizes data first by the first column, then by the second column within each first column value, and so on.Consider a multicolumn B-tree index on (, , and ). The index entries at the leaves are stored in lexicographic order:A query with a predicate like AND  would perform a single contiguous range scan, which is highly efficient. However, a query that only filters on  (omitting the leading  column) would hit scattered entries across all status groups. In such cases, Postgres would typically resort to a sequential scan or use a different index if available, leaving your carefully designed multicolumn index completely unused. This limitation forced DBAs to create multiple indexes with different column orderings to cover various query patterns, leading to increased storage overhead, slower write performance, and more complex index management.<h2>Skip Scan to the Rescue</h2>Postgres 18 introduces skip scan functionality for B-tree indexes, allowing the query planner to use multicolumn indexes even when early columns lack equality restrictions. This eliminates the frustrating scenario where perfectly good indexes sat unused because queries didn't filter on the first indexed column.The skip scan optimization works by allowing Postgres to intelligently "skip"over portions of the index to find relevant data. When you query by later columns in the index without specifying the leading column, Postgres can now:<ul><li>Identify all the distinct values in the omitted leading column(s).</li></ul><ul><li>Effectively transform the query to add conditions that match the leading values.</li></ul><ul><li>Use existing infrastructure to optimize lookups across multiple leading columns, effectively skipping any pages in the index scan which do not match the query conditions.</li></ul>This is particularly valuable for analytics and reporting workloads where you often need to query different combinations of indexed columns without always specifying the leading ones.<h2>How Skip Scan Works - Under the Hood</h2>Let's look at a practical example. Suppose we have an orders table with an index:Before Postgres 18, if you ran a query like:The index would be largely ineffective because the query doesn't filter on the leading  column. At that point, Postgres would likely perform a sequential scan.With Postgres 18's skip scan capability, the planner can efficiently use this index by internally transforming the query logic. It identifies the distinct values for  (for example, , , and ), and then performs targeted index scans for each status value combined with the  and  conditions. It essentially rewrites the query as:The key insight is that if the omitted leading column has low cardinality (a small number of distinct values), the overhead of probing each distinct value is minimal compared to a sequential scan. The planner automatically decides when skip scans provide better performance than sequential scans or other alternatives.<h2>When Does Skip Scan Shine?</h2>Skip scan is most beneficial in the following scenarios:<ul><li>: The optimization is most effective when the omitted leading columns have low cardinality. If status has only 3-5 distinct values, skip scan will perform excellently. However, if it has thousands of distinct values, the benefit diminishes significantly.</li></ul><ul><li>: The skip scan implementation targets cases where later columns in the index are referenced with equality conditions. The current implementation is optimized to check for these specific patterns.</li></ul><ul><li>: Skip scan is particularly valuable for analytics queries where you need flexibility to query different column combinations. This is common in business intelligence tools and ad-hoc reporting scenarios.</li></ul><ul><li>: Rather than creating multiple indexes with different column orderings, you can now rely on a single well-designed multicolumn index that skip scan can use effectively.</li></ul><h2>Important Limitations and Considerations</h2>While skip scan is a powerful feature, it's important to understand its current limitations:<ul><li>: Skip scan currently works only with </li><li> (the most common index type).</li></ul><ul><li>: The performance benefit decreases significantly as the number of distinct values in omitted columns increases. With high cardinality leading columns, you may still need dedicated indexes.</li></ul><ul><li>: The feature requires at least one equality condition on a later column in the index. Don't expect magic for arbitrary ranges or complex predicates on later columns.</li></ul><ul><li>: For queries returning large result sets, traditional </li><li> scan plans may still be the right answer. </li></ul><h2>Practical Example and Performance Analysis</h2>Let me demonstrate skip scan with a more detailed example. We'll create a table () with realistic data distribution:Now, let's query by product_category without specifying the region using Postgres 17:As you can see, with Postgres 17, this query is using a sequential scan because the leading  column isn't specified. With Postgres 18, the planner can use skip scan to efficiently utilize the index by scanning through each of the four region values and performing targeted lookups.Now let's run the same query in Postgres 18:The execution plan in Postgres 18 shows the skip scan in action, with significantly reduced buffer reads and improved execution time compared to a sequential scan.<h2>Configuration and Tuning</h2>Postgres 18 introduces the skip scan capability as part of the query planner's arsenal. The planner automatically decides when to use skip scan based on cost estimation. As with other planner optimizations, Postgres provides the flexibility to enable or disable skip scan through configuration, though in normal operation you should let the planner make intelligent decisions based on statistics and cost estimates.<h2>Looking Ahead</h2>The skip scan feature represents an important step forward in query optimization and index utilization. It demonstrates the community's commitment to continuously improving performance while maintaining Postgres's reputation for reliability and robustness.This feature addresses a real pain point that developers and DBAs have worked around for years. By allowing more flexible use of multicolumn indexes, skip scan simplifies database design, reduces storage overhead, and improves query performance across a wide range of scenarios.As Postgres continues to evolve, we can expect further enhancements to skip scan and other query optimization capabilities. The foundation laid in Postgres 18 will likely be built upon in future releases, potentially extending skip scan support to more complex query patterns and other index types.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/postgres-18-skip-scan-breaking-free-from-the-left-most-index-limitation</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>PostgreSQL,postgres</category>
            <title><![CDATA[Highlights of PostgreSQL 18]]></title>
            <link>https://www.pgedge.com/blog/highlights-of-postgresql-18</link>
            <pubDate>Thu, 28 Aug 2025 06:27:36 GMT</pubDate>
            <description><![CDATA[ <p>The PostgreSQL development group released the second Beta version of PostgreSQL 18 in July; the GA version is expected later in 2025 (around the September/October timeframe). The PostgreSQL development group and its community is very dedicated and ensures several minor releases during the year and major releases every year.<h1>Why You Should Perform a Major Version Upgrade</h1>Every major PostgreSQL release comes with significant new features that improve the stability, performance, and usability of PostgreSQL as well as the user experience. Each new release brings critical security patches, performance improvements, and new SQL features that can simplify development and reduce technical debt.Upgrading ensures continued community and vendor support, compatibility with evolving infrastructure and libraries, and access to enhancements in scalability, monitoring, and disaster recovery. Staying current also reduces the risk and cost of future migrations, as skipping multiple versions makes upgrades more complex and disruptive. In short, regular major upgrades keep PostgreSQL stable, fast, secure, and ready for future growth.The global and vibrant PostgreSQL community is contributing to PostgreSQL success, diligently ensuring that all changes are carefully scrutinized and reviewed before they are added to the project source code. It is also very encouraging to see big technology names like Microsoft, Google, Apple, and others investing in Postgres by developing in-house expertise and giving back to the open source community.Upgrading to PostgreSQL 18 delivers significant benefits in performance, security, and ease of management, making it a smart move for both technical and business reasons. The new asynchronous I/O engine and smarter indexing features speed up queries and maintenance tasks, while improved pg_upgrade with preserved planner statistics ensures faster, low‑risk version upgrades. Developers gain productivity with virtual generated columns, enhanced RETURNING support, and built‑in uuidv7() for better indexing, while enterprises benefit from OAuth 2.0 authentication, stronger encryption, and data checksums enabled by default for higher reliability. Combined with improved observability and flexible schema changes that minimize downtime, PostgreSQL 18 is a future‑ready release that enhances performance, security, and operational efficiency.<h2>Performance & Query Optimization</h2>Every major PostgreSQL release comes with features in different categories (i.e. performance, logical replication, monitoring, developer experience, security etc.). In this blog I will be going over the key performance features added to the PostgreSQL 18 release. Like I have done previously, this blog will be followed by at-least 2 blogs in which I will delve into more features in other categories with details and practical examples on usage.<h2>Adding an Asynchronous I/O Subsystem</h2>The ability to add an asynchronous I/O subsystem is a major performance feature added to the PostgreSQL 18 release. The Asynchronous I/O (AIO) feature is introduced to boost I/O throughput, especially for sequential scans, bitmap heap scans, and VACUUM operations. On Linux with io_uring, this can offer 2–3× performance improvements by overlapping disk access with processing.The main motivations behind adding AIO (Asynchronous I/O) to PostgreSQL are:<ul><li>Reduce the time spent waiting for IO by issuing IO sufficiently early. Historically, PostgreSQL relied heavily on blocking I/O for reads and writes. That meant a backend process would issue one I/O call and then sit idle, waiting for the OS or disk to respond before doing anything else. The idea here is to avoid that idle time by starting I/O operations </li><li> the data is actually needed—allowing PostgreSQL to overlap I/O with useful work (e.g., computation or other I/O requests).</li></ul><ul><li>Allow the use of Direct I/0 (DIO). Direct I/O refers to bypassing the OS kernel’s page cache and performing I/O operations straight between the application and storage device. DIO can offload most of the work for IO to hardware and thus increase throughput / decrease CPU utilization, as well as reduce latency. This will also enable PostgreSQL to configure DIO with a GUC setting; the GUC “io_method” is explained later in the blog.</li></ul>The AIO infrastructure allows the implementation of AIO using different methods. The feature has introduced a new GUC named  “io_method” which controls the choice of method that will be used. The choice of the AIO method is controlled by the new io_method GUC; it can be set in postgresql.conf at the server start. The io_method cannot be changed without the server restart, the server will return the following error if you try to change it with an alter system command.parameter "io_method" cannot be changed without restarting the serverThe parameter can be set to the following possible values::  This was the setting of the parameter at the initial commit when the infrastructure of the AIO was added. This is the traditional synchronous I/O with blocking I/O operations basically giving the same behavior at PG-17. Setting the value to sync doesn’t implement any AIO operations, it just ensures that new code added for this feature is bypassed.:  This is the default setting in PostgreSQL 18.  This setting uses the background I/O worker processes, the backend processes queue I/O requests while the background worker processes handle read/write operations asynchronously. The number of background I/O worker processes is controlled by the io_worker GUC. The background worker processes will be seen as dedicated processes in the OS process list.The number of background I/O worker processes is controlled by the io_worker GUC. The background worker processes will be seen as dedicated processes in the OS process list. io_uring is a Linux specific modern high performance interface for true asynchronous I/O. This method requires Postgres to be built with --with-liburing and runs on compatible kernels/filesystems. The io_uring method eliminates the worker processes and uses a shared ring buffer between Postgres and the kernel to enqueue/distribute I/O requests efficiently.This AIO method offers low syscall overhead and can significantly improve performance especially on high latency storage systems. This method gives the fastest performance on linux.The io_uring method requires the following kernel setting.I have carried out a test for AIO with all 3 supported io_methods (sync, worker, and io_uring), and shared the results of the test below. There is clear performance improvement when you go from sync to worker to io_uring as your io_method.The example below shows the use of the AIO feature with the three supported io_methods.-- Test with worker io_methodThe "I/O Timings: shared in the explain plan shows that AIO was triggered as part of this query plan.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/highlights-of-postgresql-18</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>PostgreSQL</category>
            <title><![CDATA[Unleashing the Power of PostgreSQL with pgEdge Distributed Multi-Master Replication and Postgres Platform - Part 2]]></title>
            <link>https://www.pgedge.com/blog/unleashing-the-power-of-postgresql-with-pgedge-distributed-multi-master-replication-and-postgres-platform-part-2</link>
            <pubDate>Wed, 07 May 2025 05:03:20 GMT</pubDate>
            <description><![CDATA[ <p>In this blog, we're continuing to explore the power of <a href="/solutions/benefit/multi-master"><u>multi-master replication</u></a> (MMR) with the pgEdge distributed Postgres platform and its open source extension, Spock. In the <a href="/blog/unleashing-the-power-of-postgresql-with-pgedge-distributed-multi-master-replication-and-postgres-platform-part-1"><u>first Part 1 of this blog</u></a> topic, we discussed different replication methods and deployment models for PostgreSQL replication. The blog also discusses the pros and cons of MMR replication, and how pgEdge Distributed Postgres uses the Spock extension to perform conflict management to ensure data integrity while implementing multi-master replication.In this blog we'll focus on conflict management. <a href="https://docs.pgedge.com/spock_ext#conflict-free-delta-apply-columns-conflict-avoidance"><u>Conflict-free delta-apply columns</u></a><a href="https://docs.pgedge.com/spock_ext#conflict-free-delta-apply-columns-conflict-avoidance"> </a>are a distinguishing feature of the pgEdge Spock extension that provides a definitive way to apply data updates in the correct order, preventing data conflicts and facilitating efficient and accurate replication of incremental changes and aggregate values. Effective conflict avoidance tooling is essential to maintain the integrity of your data and ensure smooth operation in a MMR environment.<h2>Understanding Conflict-Free-Delta-Apply Columns</h2>To recap the issues we discussed in the first blog: a conflict arises in an MMR cluster when the same data is updated or inserted by concurrent connections on multiple distributed nodes. Unlike a single master replication (SMR) cluster, where the master node accepts  transactions and supporting nodes answer  requests, all nodes in a distributed MMR cluster are tasked with handling both  and  operations for improved performance and efficiency.The improved performance and efficiency of MMR comes with caveats; foresight and planning are your best defense against data integrity issues. For example, the following scenarios can cause a data conflict in a distributed MMR cluster:​<ul></ul><ul></ul><ul></ul>The traditional (and most often used) method of managing conflict resolution is last update wins, where the most recent change overwrites an earlier one. By itself, this approach can lead to data inconsistencies, especially in scenarios involving cumulative operations (where you're maintaining the sum of a column). For example, if multiple transactions acting on the same account add or subtract values from the same numerical field in a different order on different nodes, the available result using the last update win approach may not always reflect the changes accurately.Spock's  mechanism resolves this issue by replicating the delta (change) rather than the final value or the value that results with the last update wins approach. Taking a  approach ensures that all concurrent/incremental updates are accurately applied across all nodes. The<a href="https://github.com/pgEdge/spock"> </a><a href="https://github.com/pgEdge/spock"><u>Spock extension's Github page</u></a><a href="https://github.com/pgEdge/spock"> </a>has the following example that explains the best usage of the  feature.To implement the  column, the Spock extension applies a small patch to community Postgres. The patch provides the ability to log old values of specified columns when an update is made to that column.<h2>Example</h2>The following example demonstrates testing conflict resolution on a three node pgEdge MMR PostgreSQL cluster. We'll use the <a href="https://www.postgresql.org/docs/17/pgbench.html"><u>pgbench</u></a> schema to show the effect of the  column on the balance column of  tables.<h2>Benefits of Conflict-Free-Delta-Apply Columns</h2>The  column is a distinguishing benefit of pgEdge's distributed PostgreSQL platform. It is an absolute must if your application uses distributed MMR clusters for use cases that involve maintaining aggregate values, running column sums, or performing incremental changes to numerical columns. Without conflict-free delta-apply, using the last update win approach to resolve conflicts on a distributed cluster can lead to critical calculation errors.Implementing  functionality with the Spock extension will make your distributed PostgreSQL database reliable and robust, and ensures data consistency. The Spock extension provides the following key benefits:<ul></ul><ul></ul><ul></ul><ul></ul></p> ]]></description>
            <guid>https://www.pgedge.com/blog/unleashing-the-power-of-postgresql-with-pgedge-distributed-multi-master-replication-and-postgres-platform-part-2</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>PostgreSQL</category>
            <title><![CDATA[Unleashing the Power of PostgreSQL with pgEdge Distributed Multi-Master Replication and Postgres Platform - Part 1]]></title>
            <link>https://www.pgedge.com/blog/unleashing-the-power-of-postgresql-with-pgedge-distributed-multi-master-replication-and-postgres-platform-part-1</link>
            <pubDate>Wed, 07 May 2025 04:47:08 GMT</pubDate>
            <description><![CDATA[ <p>Before we delve into the main subject of this blog, it is essential to understand the benefits of PostgreSQL replication, and the difference between single-master replication (SMR) and multi-master replication (MMR). In every modern business application, the database is becoming a critical part of the architecture and the demand for making the database performant and highly available is growing tremendously.<h3>Planning Ahead for Better Performance</h3>Our goal when designing a system for high performance is to make the database more efficient when handling an application request - this ensures that the database is not becoming a business bottleneck. If your database resides on a single host, the resources of the system that is hosting the database can be easily exhausted; having a system that supports scaling the database so it can more effectively respond to the application's heavy load.With pgEdge <a href="https://www.pgedge.com/products/what-is-pgedge"><u>Distributed Postgres</u></a> and the power of PostgreSQL, you can perform both horizontal and vertical scaling:<ul></ul><ul></ul>The technique of replicating data across multiple PostgreSQL databases that are running on multiple servers can also be considered horizontal scaling. The data is not distributed, but database changes are replicated to each cluster node so the application load can be divided across multiple machines to achieve better performance.Reliability and high-availability are also crucial for a powerful and responsive system:<ul><li>Reliability means that the database is able to respond to user/application requests at all times with consistency and without any server interruption.</li></ul><ul><li>High-availability is also a critical consideration that ensures that database operations are not interrupted and the database downtime is minimized.</li></ul>Statistically, downtime per year reflects the ability of your database and application to handle failures and outages without user downtime. Often, downtime per year is negotiated into a service level agreement (SLA) for applications that require high-availability; this clause specifies the cumulative length of time within that year that the database can be down. To minimize downtime, pgEdge can actively replicate the same data to each node in the cluster. Components that handle failover and query routing are also used to ensure that the database remains highly available under stress.PostgreSQL provides two methods of replication: asynchronous and synchronous.<ul><li>If you are using </li><li>asynchronous replication</li><li>, data is written to the primary server first and then it is replicated to other database nodes without waiting on confirmation from each replicated node that the data has been written.</li></ul><ul><li>If you are using </li><li>synchronous replication</li><li>, data is written to primary and replica nodes simultaneously.</li></ul>There are tradeoffs between asynchronous and synchronous replication.  Synchronous replication is safer for critical data or high-end transactional workloads that require resiliency. Asynchronous replication is suitable for most workloads, but failover might take longer when compared to a synchronous replication configuration, and there might be some risk of data loss if all the changes are not replicated to all nodes.In this summary, we've defined the terms used to describe replication in a PostgreSQL database. Lets now delve into the two deployment models for PostgreSQL replication.<h2>Single-Master Replication</h2><img src="https://a.storyblok.com/f/187930/960x720/e45e88f7f2/single-master-replication.png" >A single-master replication model consists of one primary node and one or more secondary nodes. In this model, write transactions are only sent to the primary node while read transactions are sent to both primary and secondary nodes. The secondary nodes (read-only  replicas) are used to handle query requests that don't modify data. This scenario employs middleware products (like <a href="https://www.haproxy.org/"><u>HAProxy</u></a>) that sort the write and read requests between the primary and secondary nodes. In the event of a failure, the secondary node is promoted to become a primary node with automated failovers handled by products like <a href="https://github.com/patroni/patroni"><u>Patroni</u></a> and <a href="https://www.pgpool.net/docs/46/en/html/index.html"><u>Pgpool</u></a>. When a failover completes, the middleware (HAproxy) is updated to ensure that writes are sent to the new primary node.<h2>Multi-Master Replication and Conflicts</h2>The multi-master replication deployment model consists of multiple nodes that act as the primary (or master) node. Each node is performing active-active replication between each secondary node; in an MMR cluster, client applications can perform both write and read operations against any node in the cluster. This configuration employs shared-nothing architecture without a coordinator node.You can configure single-master replication using only native PostgreSQL tooling, but multi-master replication capabilities must be provided by companies like <a href="/"><u>pgEdge</u></a>. pgEdge provides a fully distributed and 100% PostgreSQL based cluster with benefits like low latency for high performance, selective filtering for data residency, and conflict resolution. Once configured, a pgEdge MMR cluster enables a client application to send write commands to all of the nodes in the cluster. It's worth noting that multiple clients updating the same record concurrently can lead to conflicts that are handled by the conflict-resolution solution provided by pgEdge.<img src="https://a.storyblok.com/f/187930/960x720/a79636ab19/multi-master-replication.png" >During active-active replication, synchronization of data between nodes can cause a conflict if changes are applied to the same row on multiple nodes concurrently by more than one client session. A conflict can occur even if the transactions causing the problem take place in a different timestamp; the conflict will arise when replicating the changes to synchronise the nodes.Different types of transactions will cause different types of conflicts in a MMR replication scenario; this will help you get a better understanding of MMR conflicts:<ul></ul><ul></ul><ul></ul><h2>Conflict Detection and Resolution</h2>From the PostgreSQL documentation:“Logical replication behaves similarly to normal DML operations in that the data will be updated even if it was changed locally on the subscriber node. If incoming data violates any constraints the replication will stop. This is referred to as a conflict. A conflict will produce an error and will stop the replication; it must be resolved manually by the user. Details about the conflict can be found in the subscriber's server log”The MMR solution from pgEdge provides a <a href="https://docs.pgedge.com/spock_ext#conflict-free-delta-apply-columns-conflict-avoidance"><u>solution for detecting and resolving conflicts</u></a> without breaking replication between nodes. In the examples that follow,  conflicts are used to demonstrate how pgEdge platform detects and resolves issues automatically without impacting replication. pgEdge platform utilizes an open source extension named <a href="https://github.com/pgEdge/spock"><u>Spock</u></a> that provides MMR capabilities with automatic DDL updates, conflict detection/resolution, and more.In our example, we are going to use a 3 node pgEdge cluster that is running on  on different ports. The <a href="https://docs.pgedge.com/platform/troubleshooting/find_info"><u>spock.node</u></a> table below displays the nodes in the cluster.We have created the  table shown below, and used automatic DDL replication functionality from the Spock extension to replicate it across our cluster.The above command spawns three  sessions in the background and tries to update the employee name for the same row on all three nodes; this could potentially result in an  conflict. The conflict is resolved automatically by pgEdge Spock extension, by applying the commit with the latest timestamp hence using a last update wins strategy.<h2>Exception Logging</h2>The pgEdge distributed Postgres Spock extension provides exception logging that logs the errors that are encountered while trying to apply changes at the replication subscriber. Exception logging ensures that replication between nodes isn't broken due to the errors caused by applying the replication changes.The examples below cause an  conflict by inserting the same value in the primary key column from multiple psql clients. The duplicate key violation error is captured in the exception log, and replication continues to function without any interruptions.The example that follows causes a  conflict while deleting the same record from multiple psql clients. The error occurs because during synchronization of nodes, the row to be deleted is missing on some nodes; this error is captured in the Spock exception log table without causing any interruption to the replication between nodes.Spock's exception logging ensures that replication between nodes doesn’t fail when a discrepancy is encountered while trying to replicate changes to a node. The above examples demonstrate how conflicts are captured in the exception log table without causing any interruption to the replication. This allows you to review issues at a time that is convenient for you.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/unleashing-the-power-of-postgresql-with-pgedge-distributed-multi-master-replication-and-postgres-platform-part-1</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>PostgreSQL</category>
            <title><![CDATA[Preserving replication slots across major Postgres versions - PostgreSQL high availability for major upgrades ]]></title>
            <link>https://www.pgedge.com/blog/preserving-replication-slots-across-major-postgres-versions-postgresql-high-availability-for-major-upgrades</link>
            <pubDate>Mon, 27 Jan 2025 13:45:05 GMT</pubDate>
            <description><![CDATA[ <p>In this blog (the third in my series), I'd like to present yet another new feature in the PostgreSQL 17 release: enhancement to logical replication functionality in PostgreSQL. The blog will also provide a small script that demonstrates how to use this feature when upgrading from Postgres 17 to a future version.  In my prior blogs, (also published on Planet PostgreSQL, and DZone) I have written about other PG-17 features which you can read about:<ul><li> </li><li>PostgreSQL 17 - A Major Step Forward in Performance, Logical Replication and More</li></ul><ul><li><a href="https://www.pgedge.com/blog/postgresql-17-and-its-key-improvements-now-available-for-pgedge-distributed-postgresql">PostgreSQL 17 and its key improvements</a></li></ul>PostgreSQL 17 is a really powerful major release from the PG community - with this new release, community focus continues to be on making PostgreSQL even more performant, scalable, secure, and enterprise ready. Postgres 17 also improves the developer experience by adding new features for compatibility, and making existing features more powerful and robust.These features also help products that provide distributed Postgres improve their PostgreSQL high availability (HA) experience, especially related to system upgrades across major versions. <a href="http://www.pgedge.com"><u>pgEdge </u></a>also provides a PostgreSQL-based distributed database platform for low latency, high availability, and data residency. The HA capabilities of pgEdge ensure that major PostgreSQL upgrades can be done with nearly zero downtime so your applications can continue to work without user interruption. There is work in progress that will provide a path to a zero downtime upgrade by adding and removing nodes from the cluster. This is not the main topic of this blog but stay tuned for more about this functionality.The diagram below (from one of my older blogs, updated for PG-17) describes the evolution of the logical replication feature in PostgreSQL. The building blocks for logical replication were added in PostgreSQL 9.4, but the logical replication feature wasn't added until PostgreSQL 10. Since then, there have been a number of important improvements to logical replication.<img src="https://a.storyblok.com/f/187930/960x720/cecfc5bdc1/picture1blog.png" ><h2>Preserving Replication Slots</h2>Now coming back to our topic, this new feature makes it possible to preserve replication slots while performing upgrades between major versions of Postgres, eliminating the requirement to resync the data between two nodes that were replicating the data using logical replication. Please note this feature is only available for use when performing upgrades from Postgres 17 to future major versions. Upgrades from versions prior to Postgres 17 still need to follow the process of recreating the replication slots and creating subscribers that rsync the data between the replicating nodes.This patch was authored by Hayato Kuroda and Hou Zhijie and committed by Amit Kapila. Here is the Postgres commit log entry for this feature:commit 29d0a77fa6606f9c01ba17311fc452dabd3f793dAuthor: Amit Kapila <akapila@postgresql.org>Date:   Thu Oct 26 06:54:16 2023 +0530    Migrate logical slots to the new node during an upgrade.    While reading information from the old cluster, a list of logical    slots is fetched. At the later part of upgrading, pg_upgrade revisits the    list and restores slots by executing pg_create_logical_replication_slot()    on the new cluster. Migration of logical replication slots is only    supported when the old cluster is version 17.0 or later.    If the old node has invalid slots or slots with unconsumed WAL records,    the pg_upgrade fails. These checks are needed to prevent data loss.    The significant advantage of this commit is that it makes it easy to    continue logical replication even after upgrading the publisher node.    Previously, pg_upgrade allowed copying publications to a new node. With    this patch, adjusting the connection string to the new publisher will    cause the apply worker on the subscriber to connect to the new publisher    automatically. This enables seamless continuation of logical replication,    even after an upgrade.<h2>Sample script</h2>Now let's write a little script to test this feature; as I've mentioned before this only works when you are upgrading from Postgres 17 to a future major release. Any replication slots on the old cluster that are invalid or have unconsumed WAL will need to be repaired prior to the upgrade or the upgrade will fail.Please note that the script below uses Postgres 17.2 for both the old and new clusters to demonstrate the functionality; we'll have to wait for another major version to become available before we can actually show the functionality at its best. The results from the script are also listed below, showing the replication slot created in the old cluster has been copied over to the new cluster. Results from running the script : </p> ]]></description>
            <guid>https://www.pgedge.com/blog/preserving-replication-slots-across-major-postgres-versions-postgresql-high-availability-for-major-upgrades</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>PostgreSQL</category>
            <title><![CDATA[PostgreSQL 17 - A Major Step Forward in Performance, Logical Replication and More]]></title>
            <link>https://www.pgedge.com/blog/postgresql-17-a-major-step-forward-in-performance-logical-replication-and-more</link>
            <pubDate>Fri, 11 Oct 2024 07:01:00 GMT</pubDate>
            <description><![CDATA[ <p>After a successful 3rd beta in August 2024, the PostgreSQL development group released the GA version of Postgres 17 on September 26th. Recently, I blogged about some of the key logical replication features that you'll see in PostgreSQL 17 <a href="https://www.pgedge.com/blog/logical-replication-features-in-pg-17"><u>https://www.pgedge.com/blog/logical-replication-features-in-Postgres 17</u></a>.  In this blog I'll describe a couple of new performance features that you'll find in Postgres 17 as well as another important logical replication feature that I didn't cover in my earlier blog of this series.PostgreSQL has grown remarkably over the years, and with each major release has become a more robust, reliable, and responsive database for both mission critical and non-mission critical enterprise applications. The global and vibrant PostgreSQL community is contributing to PostgreSQL success, diligently ensuring that all changes are carefully scrutinized and reviewed before they are added to the project source code. It is also very encouraging to see big technology names like Microsoft, Google, and others investing in Postgres by developing in-house expertise and giving back to the open source community.Improvements to logical replication are making it even more robust and reliable for enterprise use, while providing core capabilities that vendors like <a href="https://www.pgedge.com/">pgEdge </a>can build on to deliver fully distributed PostgreSQL. Distributed PostgreSQL refers to the implementation of PostgreSQL in a distributed architecture, allowing for enhanced scalability, fault tolerance, and improved performance across multiple nodes. A pgEdge <a href="https://www.pgedge.com/products/what-is-pgedge">fully distributed PostgreSQL</a> cluster already provides essential enterprise features like improved performance with low latency, high availability, data residency, and fault tolerance.Now without further adieu let's discuss some PostgreSQL 17 performance features:     <h2>Improved Query Performance with Materialized CTEs</h2><a href="https://www.postgresql.org/docs/17/queries-with.html"><u>Common Table Expressions (CTEs)</u></a> in PostgreSQL are temporary result sets that can be referenced within a , , , or  statement. They enhance the readability and organization of complex queries and can be recursive, making them particularly useful for hierarchical data. The basic syntax of a CTE query is as follows:Include the  keyword in a query to create the CTE; the parent query (that defines the result set) follows the  clause after the CTE name. After defining the CTE, you can refer to the CTE by name to reference the result set of the CTE and carry out further operations on the result set within the same query.PostgreSQL 17 continues to enhance performance and capabilities around CTEs, including improvements in query planning and execution. Older versions of Postgres treat CTEs as optimization fences, meaning the planner could not push down predicates into them. However, from PostgreSQL 12 onward, you can define more efficient execution plans. You should always analyze your queries and consider the execution plans when performance is critical.Performance tip: If you will be referring to the same result set multiple times, create the CTE with the  keyword. When you <a href="https://www.postgresql.org/docs/17/queries-with.html#QUERIES-WITH-CTE-MATERIALIZATION"><u>create a materialized CTE</u></a>, Postgres computes and stores the result of the parent query. Then, subsequent queries aren't required to perform complex computations multiple times if you reference the CTE multiple times.<h2>Extracting column statistics from CTE references; Postgres 17 improves materialized CTE’s</h2>A materialized CTE basically acts as an optimization fence, which means that the outer query won’t influence the plan of the sub-query once that plan is chosen. The outer query has visibility into the estimated width and row counts of the CTE result set, so it makes sense to propagate the column statistics from the sub-query to the planner for the outer query. The outer query can make use of whatever information is available, allowing the column statistical information to propagate up to the outer query plan but not down to the CTE plan.This bug reported to the community contains a simple test case that can demonstrate the improvement and effect on the query planner as a result of this improvement.<a href="https://www.postgresql.org/message-id/flat/18466-1d296028273322e2%40postgresql.org"><u>https://www.postgresql.org/message-id/flat/18466-1d296028273322e2%40postgresql.org</u></a>Example - Comparing Postgres 16 behavior to Postgres 17First, we create our work space in Postgres 16 and run ANALYZE against it; two tables and indexes:Then, we create our materialized CTE:The query plan from our Postgres 16 code sample contains:As you can see in the query plan, the column statistics of 200 rows from the sub-query is wrong, which is impacting the overall plan.Then, we test the same setup and query against PostgreSQL 17As you can see in the query plan for Postgres 17, the column statistics from the subquery are correctly propagating to the upper planner of the outer query. This helps PostgreSQL choose a better plan that improves the execution time of the query.This is a simple query, but with bigger and complex queries this change can result in a major performance difference.<h2>Propagating pathkeys from a CTE to an Outer Query</h2>Another interesting improvement to CTE functionality in Postgres 17 is the propagation of path keys from the sub-query to the outer query. In PostgreSQL, pathkeys are a part of the query execution planning process used primarily for sorting and ordering rows in queries that require ordered results, such as queries with an  clause, or when sorting is needed for other operations like merge joins.Prior to Postgres 17, the sort order of the materialized CTE sub-query was not shared with the outer query, even if sort order was guaranteed by either an index scan node or sort node. Not having a guaranteed sort order allows the PostgreSQL planner to choose a less optimized plan, whereas having a guaranteed sort order will make it more likely to choose an optimized plan.With PostgreSQL 17, if a CTE is materialized and has a specific sort order, the planner can reuse that information in the outer query, improving performance by avoiding redundant sorting or enabling more efficient join methods. As noted in the commit comments by <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a65724dfa73db8b451d0c874a9161935a34a914e"><u>Tom Lane</u></a>, "The code for hoisting pathkeys into the outer query already exists for regular  subqueries, but it wasn't getting used for CTEs, possibly out of concern for maintaining an optimization fence between the CTE and the outer query."This simple modification to the Postgres source code should result in performance improvements for queries involving complex CTEs, especially those where sorting or merge joins can be optimized based on the inherent order of CTE results.Here is an example using the data in PostgreSQL regression The query plan from our Postgres 16 code sample contains:The query plan from our Postgres 17 code sample contains:The query plans in Postgres 16 and Postgres 17 are significantly different due to this version 17 enhancement. This is a small example; you can see the performance gain will be significant in larger queries. Please note that this improvement is only effective if the CTE subquery has an  clause.<h2>Fast B-Tree index scans for Scalar Array</h2>In PostgreSQL,  is a node type in the execution plan that handles queries involving operations like  or  with arrays or lists of values. It's particularly useful for queries where you compare a column against a set of values, such as: allows PostgreSQL to optimize queries that involve multiple comparisons that use  or .  PostgreSQL 17 has introduced new performance enhancements to make these operations even faster.In PostgreSQL 17, significant improvements have been made to B-tree index scans, which optimize performance, particularly for queries with large  lists or  conditions. These enhancements reduce the number of index scans performed by the system, thereby decreasing CPU and buffer page contention, resulting in faster query execution.One of the key improvements is in handling Scalar Array Operation Expressions (), which allows more efficient traversal of B-tree indexes, particularly for multidimensional queries. For example, when you have multiple index columns (each with its own  list), PostgreSQL 17 can now process these operations more efficiently in a single index scan, rather than multiple scans as in earlier versions. This can lead to performance gains of 20-30% in CPU-bound workloads where page accesses were previously a bottleneck.Additionally, PostgreSQL 17 introduces better management of internal locks, further enhancing performance for high-concurrency workloads, especially when scanning multiple dimensions within a B-tree index.We can demonstrate this with a simple example. We'll use the same  table and data that we used in the previous example from the Postgres regression suite.Our example, first run on Postgres 16:In the previous query you can see that the shared buffer hit for the  query was 9 and that it took 3 index scans to get the results from the index scan. In PostgreSQL, the term shared hit refers to a specific type of cache hit related to buffer management. A shared hit occurs when PostgreSQL accesses a data block or page from the shared buffer pool rather than from disk, improving query performance.The same example, this time run on Postgres 17:As you can see, with Postgres 17 the shared buffer hit is reduced to 5, and most importantly it is only doing one index scan (as opposed to 3 scans in the case of Postgres 16). With this improvement in Postgres 17, the performance of scalar array operations is greatly improved, and Postgres can choose from better optimized query plans.<h2>Retention of logical replication slots and subscriptions during upgrade</h2>The retention of logical replication slots and migration of subscription dependencies during themajor upgrade process is another logical replication feature added to PostgreSQL 17. Please note that this feature will only be useful in upgrading from PostgreSQL 17 to later versions, this is not supported for upgrade prior to Postgres 17. The replication slots and replication origins are generated when building a logical replication environment. However this information is specific to the node in order to record replication status, application status and WAL transmission status so they aren’t upgraded as part of the upgrade process. Once the published node is upgraded the user needs to manually construct these objects.The <a href="https://www.postgresql.org/docs/17/pgupgrade.html"><u>pg_upgrade</u></a> process is improved in PostgreSQL 17 to reference and rebuild these internal objects; this functionality enables replication to automatically resume when upgrading a node that has logical replication. Previously, when performing a major version upgrade, users had to drop logical replication slots, requiring them to re-synchronize data with the subscribers after the upgrade. This added complexity and increased downtime during upgrades.You need to follow these steps when upgrading the publisher cluster:<ul><li>Ensure any subscriptions to the publisher are temporarily disabled by performing an </li><li>ALTER SUBSCRIPTION….DISABLE</li><li>. These are enabled after the upgrade process has completed.</li></ul><ul><li>Set the new cluster's </li><li>wal_level</li><li> to logical.</li></ul><ul><li>The </li><li>max_replication_slots</li><li> on the new cluster must be set to a value greater than or equal to replication slots on the old cluster.</li></ul><ul><li>Output plugins used by the slots must be installed in the new cluster.</li></ul><ul><li>All the changes from the old cluster are already replicated to the target cluster prior to the upgrade.</li></ul><ul><li>All slots on the old cluster must be usable; you can ensure this by checking conflicting columns in </li><li>pg_replication_slots</li><li> view. </li><li>Conflicting</li><li> should be </li><li>false</li><li> for all the slots on the old cluster.</li></ul><ul><li>No slots in the new cluster should have a value of </li><li>false</li><li> in the </li><li>Temporary</li><li> column of the </li><li>pg_replication_slots</li><li> view. There should be no permanent logical replication slots in the new cluster.</li></ul>The pg_upgrade process of upgrading replication slots will result in an error if any of the above prerequisites aren’t met.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/postgresql-17-a-major-step-forward-in-performance-logical-replication-and-more</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>pgEdge,Distributed Postgres,PostgreSQL</category>
            <title><![CDATA[PostgreSQL 17 and its key improvements now available for pgEdge Distributed PostgreSQL]]></title>
            <link>https://www.pgedge.com/blog/postgresql-17-and-its-key-improvements-now-available-for-pgedge-distributed-postgresql</link>
            <pubDate>Wed, 02 Oct 2024 11:00:00 GMT</pubDate>
            <description><![CDATA[ <p>The PostgreSQL community released PostgreSQL 17 to GA on September 26,  2024. With PostgreSQL 17, community focus continues to be on making PostgreSQL more performant, scalable, secure, and enterprise ready. Postgres 17 also improves the developer experience by adding new features for compatibility, and making existing features more powerful and robust.  pgEdge, which provides a PostgreSQL based distributed database platform for low latency, high availability, and data residency this week made PostgreSQL 17 available as a supported Postgres version in pgEdge Platform, alongside PostgreSQL versions 15 and 16. Support for PostgreSQL 17 in pgEdge Cloud will come later in Q4.pgEdge support for PostgreSQL 17 makes it available as part of a responsive multi-master cluster that offers enhanced replication capabilities like DDL replication, conflict management, conflict avoidance, and more.  pgEdge supports clusters running on a mix of different PostgreSQL versions, permitting zero downtime major version upgrades.Recently, I blogged about some of the key logical replication features that you'll see in PostgreSQL 17 <a href="https://www.pgedge.com/blog/logical-replication-features-in-pg-17"><u>https://www.pgedge.com/blog/logical-replication-features-in-Postgres 17</u></a>.In this blog, we'll pick up where I left off.  The following sections detail the major improvements in Postgres 17 that enhance database behavior in a multi-master distributed cluster.<h2>Logical Replication</h2>The most notable improvements in PostgreSQL 17 are improvements to logical replication features:<ul></ul><ul></ul><ul></ul><h2>Storage with Incremental Backup</h2>Block level incremental backup is a major feature added to <a href="https://www.postgresql.org/docs/17/app-pgbasebackup.html"><u>pg_basebackup</u></a> in PostgreSQL 17. The incremental backup feature allows you to only backup the changes since the last full backup. This feature will greatly improve the efficiency of backups and reduce the storage you need to use for storing backups. Instead of performing a full backup every time you can instruct the server to backup changes since the last full backup, significantly reducing the size of the backup and decreasing the time it takes to perform the backup.<h2>Performance </h2>Several enhancements have been made to Postgres 17 to improve performance:<ul><li>Major improvements to </li><li>common table expression (CTE)</li><li> queries: By propagating information like pathkeys and column statistics to the upper level plan, PostgreSQL significantly improves query planning and populates CTE queries faster.</li></ul><ul><li>Better </li><li>memory management of VACUUM</li><li>: The vacuum process is optimized to reduce memory usage by up to 20 times by introducing a more efficient </li><li>internal memory structure</li><li> for use during vacuum operations. This leads to faster execution, especially on large tables, and frees up more shared memory resources for other operations.</li></ul><ul><li>Improved </li><li>WAL throughput</li><li>: Write ahead log handling is significantly improved in Postgres 17, allowing twice the WAL throughput in certain high concurrency workloads.</li></ul><ul></ul><h2>Compatibility </h2>Key compatibility improvements were introduced, including <u>MERGE</u> command updates and better JSON support:<ul></ul>The <u>MERGE</u> command benefits from the following improvements in Postgres 17:<ul><li>Allow the </li><li>MERGE</li><li> command to modify updateable views.</li></ul><ul></ul><ul><li>The use of the </li><li>RETURNING</li><li> clause is now supported in the </li><li>MERGE</li><li> command; the new function </li><li>merge_action()</li><li> reports on the DML that generated the row. </li></ul><ul></ul></p> ]]></description>
            <guid>https://www.pgedge.com/blog/postgresql-17-and-its-key-improvements-now-available-for-pgedge-distributed-postgresql</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>pgEdge,PostgreSQL</category>
            <title><![CDATA[pgEdge Platform Support for Large Object Logical Replication]]></title>
            <link>https://www.pgedge.com/blog/pgedge-platform-support-for-large-object-logical-replication</link>
            <pubDate>Wed, 07 Aug 2024 07:08:00 GMT</pubDate>
            <description><![CDATA[ <p>Replication of large objects isn't currently supported by the community version of PostgreSQL logical replication. If you try to replicate a large object with logical replication, PostgreSQL will return: .  It's a meaningful error (always nice), but not helpful if you have large objects that you need to replicate.pgEdge has developed an extension named LargeObjectLOgicalReplication (LOLOR) that provides support for replicating large objects. The primary goal of LOLOR is to provide seamless replication of large objects with pgEdge Spock multi-master distributed replication.You can access and manipulate large objects in a PostgreSQL database with the following <a href="https://www.postgresql.org/docs/16/lo-interfaces.html"><u>client interface functions</u></a>: <ul></ul><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul>The pgEdge LOLOR extension supports the same large objects functions put in place by PostgreSQL, so all of your existing applications that use the previously mentioned functions will continue to work seamlessly. The easiest way to install the LOLOR extension is with <a href="https://docs.pgedge.com/platform"><u>pgEdge Platform</u></a>.  After installing pgEdge Platform, you can use pgEdge Platform to install LOLOR, create the extension, and add it to the  parameter by navigating into the  installation directory and running the command:In this blog, we are going to create a two node pgEdge cluster on the localhost to demonstrate how pgEdge Platform replicates large objects. We'll also share a native PSQL example of using the extension for replicating large objects, and a JDBC example that shows how we can use the extension from a Java program using a JDBC driver.In any directory owned by your non-root user, use the following command to <a href="https://docs.pgedge.com/platform/installing_pgedge/manual"><u>install pgEdge</u></a> on all nodes of the cluster; you'll need to invoke this command on each replication node host:Node 1 setupNavigate into the  directory on node 1 and perform the following steps :Run the following command to set up the pgEdge platform; this command installs PostgreSQL version 16 and the pgEdge Spock and Snowflake extensions.Then, run the following command to create a Spock node (we are creating a node named ). Note that user named in the command below (in our command ) needs to be an OS user:The next command creates the subscription between  and . You should run this command after completing the initial pgEdge setup on .Then, use the following command to install the LOLOR extension : Then, source your PostgreSQL installation, connect with PSQL, and run the  statement to create the LOLOR extension:You'll also need to set the  configuration parameter before using the extension. Set the value to the number that corresponds to the node on which you're setting the parameter; the value can be from 1 to 2^28.Please restart the server after adding the above configuration parameter to the  file. The postgresql.conf file is located in the data directory under your PostgreSQL installation.Before using LOLOR functionality, you also need to add the large object catalog tables to the  replication set. You can use the following commands:The following commands are executed to enable automatic DDL replication : Node 2 setupNavigate into the  directory on node 2 and perform the following steps to configure the LOLOR extension:Run the following command to install pgEdge Platform, this will install PG-16, and the pgEdge Spock and Snowflake extensions.Use the following command to create a Spock node. Please note that the user provided in the following command needs to be a OS user : Then, use the following command to create the subscription between  and : Now we are ready to install the LOLOR extension with the command:Then, log in PSQL and invoke the  statement:You must set  to a number that represents the node in the replication cluster before using LOLOR. Acceptable values range from 1 to 2^28.Please restart the server after adding the above configuration parameter to .After setting the  parameter, use the following commands to add the large object catalog tables to the  replication set:Then, execute the following commands to enable automatic DDL replication : <h2>Example: Using the PSQL Command Line to Exercise LOLOR</h2>In the sections that follow, we are going to do a short test that demonstrates large object replication using the PSQL client. PSQL is a secure, native PostgreSQL client that uses the libpq driver to negotiate connections.First, we are going to perform the following SQL commands on node 1:We have auto_ddl enabled so the table is also getting replicated to other nodes. We can query node 2 with the following  statement to confirm that the large object was replicated:<h2>Example: Using a JDBC Connection to Query a Large Object </h2>The following program code connects with a pgEdge node and loads  file in the database as a large object and perform retrieval operations.To simplify connection management, you can specify connection information in the app.properties file, and then reference the file in your JDBC connection.example.java</p> ]]></description>
            <guid>https://www.pgedge.com/blog/pgedge-platform-support-for-large-object-logical-replication</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>PostgreSQL,PostgreSQL</category>
            <title><![CDATA[Logical Replication Features in PG-17]]></title>
            <link>https://www.pgedge.com/blog/logical-replication-features-in-pg-17</link>
            <pubDate>Thu, 23 May 2024 06:49:54 GMT</pubDate>
            <description><![CDATA[ <p><h2>Introduction</h2>About a year ago, I blogged about logical replication improvements in PostgreSQL version 16. PostgreSQL 16 was a really good release for logical replication improvements, with performance critical features like parallel apply, providing replication origin for supporting bi-directional replication, and allowing a standby server to be a publisher. Please refer to the old blog post for more details on version 16 replication-related features - you'll find that post at:<a href="/blog/postgresql-16-logical-replication-improvements-in-action">https://www.pgedge.com/blog/postgresql-16-logical-replication-improvements-in-action</a>PostgreSQL 17 also includes a number of significant improvements for logical replication. The enhancements are geared towards improving the usability of logical replication, and meeting high-availability (HA) requirements. In this blog we are going to discuss some of the key logical replication features added to PostgreSQL 17; we won’t be covering all the new features in this blog so there will likely be more than one blog in this series.I want to thank my PostgreSQL community friends Amit Kapila for introducing me to logical replication features in PostgreSQL 17, and Hayoto Kurado for helping me to understand and test these features.<h2>Synchronizing Slots from Primary to Standby (Failover Slot)</h2>My top pick among the logical replication improvements in version 17 is the failover slot synchronization improvements; this is essentially a high availability feature that allows logical replication to continue working in the event of a primary failover. The feature keeps the replication slot on the primary node synchronized with the designated slots in the standby server. To meet this goal, the server starts <a href="https://www.postgresql.org/docs/devel/logicaldecoding-explanation.html#LOGICALDECODING-REPLICATION-SLOTS-SYNCHRONIZATION">slotsync worker(s)</a> on the standby server that ping the primary server at regular intervals for the logical slots information, and updates the local slot if there are changes.There are two ways to use this feature:<ul><li>The first approach is to enable the </li><li>sync_replication_slots</li><li> GUC on the standby node. In this approach, the slotsync worker periodically fetches information and updates locally. Note that if you take this approach, you should not query the </li><li>pg_sync_replication_slot()</li><li> function.</li></ul><ul><li>The other way to use this functionality is to call the </li><li>pg_sync_replication_slot()</li><li> function. If you use the function to update your slot, the backend process connects to the primary and performs the update operation once. Note that you cannot call the function if </li><li>sync_replication_slots</li><li> is turned on, and the slotsync worker is already periodically refreshing the slots between the standby and primary.</li></ul>To enable this feature, you need to call the <a href="https://www.postgresql.org/docs/16/functions-admin.html#FUNCTIONS-REPLICATION">pg_create_logical_replication_slot()</a> function or use the <a href="https://www.postgresql.org/docs/16/protocol-replication.html#PROTOCOL-REPLICATION-CREATE-REPLICATION-SLOT">CREATE REPLICATION SLOT ...LOGICAL</a> command on the primary node to configure a replication slot. When configuring the slot, set the  property for the slot to .You also need to set the following parameters to keep the physical standby synchronized with the primary server :<ul><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-STANDBY-SLOT-NAMES">standby_slot_names</a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-STANDBY-SLOT-NAMES">: This parameter holds a list of physical replication slots that logical replication processes will wait for. If a logical replication node is meant to switch to a physical standby after the standby is promoted, the physical replication slot for the standby should be included in the slots listed in this parameter. This ensures that logical replication is not ahead of the physical standby, and this prevents the subscriber from being ahead of the hot_standby when consuming changes from the primary. Some latency can be expected when sending changes from the primary to some of the waiting slots on the standby.</a></li></ul><ul><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-STANDBY-SLOT-NAMES">sync_replication_slots</a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-STANDBY-SLOT-NAMES">: This parameter needs to be enabled on the standby server in order to periodically sync the slots between standby and the primary. The slotsync worker periodically fetches information and updates locally.</a></li></ul><ul><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-PRIMARY-CONNINFO">primary_conninfo</a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-PRIMARY-CONNINFO"> </a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-PRIMARY-CONNINFO">: You can either set this parameter in the </a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-PRIMARY-CONNINFO">postgresql.conf</a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-PRIMARY-CONNINFO"> file or specify it on the command line. Set this parameter on the standby server to specify the connection string of the primary server. For replication slot synchronization, you'll also need to specify a valid database name in the primary_conninfo string. This will only be used for slot synchronization; it is ignored for streaming.</a></li></ul><ul><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-PRIMARY-SLOT-NAME">primary_slot_name</a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-PRIMARY-SLOT-NAME"> </a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-PRIMARY-SLOT-NAME">: Specify the name of an existing replication slot to be used when connecting to the sending server via streaming replication. The slot sync worker doesn’t work if this parameter is not set.</a></li></ul><ul><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK">hot_standby_feedback</a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK"> </a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK">: This parameter must also be set to </a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK">on</a></li><li><a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK">. The parameter specifies whether or not a hot standby will send feedback to the primary or upstream standby about queries currently executing on the standby.</a></li></ul>You can use the <a href="https://www.postgresql.org/docs/devel/view-pg-replication-slots.html">pg_replication_slots</a> view to review the properties of a replication slot. Those slots with a synced value of  in the pg_replication_slots view can resume logical replication after failover; these slots have been synchronized.Another important step after failover to a synced slot is to update the connection information to the primary node for each subscriber. Connect to each subscriber, and use the <a href="https://www.postgresql.org/docs/devel/sql-altersubscription.html">ALTER SUBSCRIPTION </a>command to update the connection information of the new primary.<h2>Failover Slots in Action</h2>In our example, we are going to spin up two instances of PostgreSQL; one instance will be our primary server, and the other will be our standby server. We will call the publisher instance node1, and the standby server node 2 for the purposes of this example. We'll keep the replication slot on the standby server synchronized with the replication slot of the primary so in the event of a failover, the standby will be promoted to primary. After promoting the standby server to primary, any other standby server will need to be updated to connect to the new primary server.<h2>pg_createsubscriber</h2><a href="https://www.postgresql.org/docs/devel/app-pgcreatesubscriber.html">pg_createsubcriber </a>is an executable included in PostgreSQL 17 that converts a physical standby server into a logical replica. This utility creates a replication setup for each of the databases that are specified in the pg_createsubscriber command. If you specify multiple databases, the utility will create a publisher node and subscriber node for each database, and all the tables within the specified database(s).When setting up replication, the initial data copy can be a slow process. When you use the pg_createsubscriber utility you can avoid the initial data synchronization, making this ideal for large database systems.The source server <a href="https://www.postgresql.org/docs/devel/runtime-config-wal.html#GUC-WAL-LEVEL">wal_level</a> needs to be set to , and <a href="https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-MAX-REPLICATION-SLOTS">max_replication_slots</a> needs to be greater than the number of databases specified in the <a href="https://www.postgresql.org/docs/devel/app-pgcreatesubscriber.html">pg_createsubscriber</a> command. You should review the complete list of <a href="https://www.postgresql.org/docs/devel/app-pgcreatesubscriber.html">Prerequisites and Warnings</a> at the project page before using pg_createsubscriber.The automated script that follows shows how to use the pg_createsubscriber utility to convert a physical standby server in a logical replication setup. The script will convert a primary and standby server into logical replication setup with publisher and subscriber for each database specified in the command. All the user tables that are part of the primary database will be added to the publication. In the example below, the pgbench tables are included in the publication.Result of running the above scripts:<h2>Conclusion</h2>The demand for distributed PostgreSQL databases by the Enterprise is growing rapidly, and replication is a vital and core part of any distributed system. Starting with PostgreSQL 10, the logical Replication features in PostgreSQL are evolving to become more mature and feature rich with every major release.pgEdge builds on this strong foundation to provide <a href="/">fully distributed Postgres</a> that delivers <a href="/solutions/benefit/multi-master">multi master</a><a href="https://www.pgedge.com/solutions/benefit/multi-master"> </a>capability and the ability to go multi-region and multi-cloud. pgEdge adds essential features such as conflict management, conflict avoidance, automatic DDL replication and more to cater to the demands of always on, always available and always responsive global applications.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/logical-replication-features-in-pg-17</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>PostgreSQL</category>
            <title><![CDATA[PostgreSQL clustering solutions]]></title>
            <link>https://www.pgedge.com/blog/postgresql-clustering-solutions</link>
            <pubDate>Mon, 01 Apr 2024 12:12:00 GMT</pubDate>
            <description><![CDATA[ <p><h2>Introduction</h2>In my previous post <a href="https://www.pgedge.com/blog/logical-replication-evolution-in-chronological-order-clustering-solution-built-around-logical-replication">A Brief History of Logical Replication in Postgres — and Looking Ahead at its Likely Future Evolution</a><a href="https://www.pgedge.com/blog/logical-replication-evolution-in-chronological-order-clustering-solution-built-around-logical-replication">,</a> I provided a retrospective journey of the logical replication feature in PostgreSQL, starting from Postgres 9.6, where some of the building blocks were laid down. The blog also provides an insight into how a big feature like logical replication evolves and matures in the PostgreSQL community.This is the second blog of a two blog series. In this post,  I will be talking about PostgreSQL cluster solutions that are based on logical replication and the pgEdge approach to creating a high availability cluster.We have recently seen unprecedented growth in the user base for most enterprises; this in turn has led to exponential data growth. Scalability in a distributed PostgreSQL environment has become the most pressing need of a replication solution. In addition to scalability for better performance and low latency, enterprises need high availability. High availability means that there is near zero percent downtime for users in the event of hardware/software/network issues or maintenance windows.This is where distributed PostgreSQL comes into play. Before we go into specific Postgres cluster solutions, it is important to understand the concept of database clustering and its benefits. A Postgres cluster involves setting up a group of servers (nodes) to work together to provide a higher level of availability, reliability, and scalability than can be achieved with a single database server. In simpler terms, database clustering refers to the practice of linking several servers or instances together to work as a single system. This configuration enhances the performance, availability, and scalability of database systems. This is crucial for applications requiring high availability and performance, as it allows for data to be replicated across multiple nodes and for queries to be distributed among them, enhancing both fault tolerance and load distribution.Now let's switch our attention to the main topic of this post. In this post, we are going to discuss PostgreSQL clustering solutions that are based on logical replication. Our solution provides active-active multi-master capabilities - this means that all nodes in the cluster will have the same copy of the data, providing data redundancy. The nodes are configured with asynchronous multi-master replication, and application user traffic is distributed across the nodes to provide better performance and high availability.<h2>pgEdge - Fully Distributed PostgreSQL</h2>Applications these days have to be highly responsive and always available - even during maintenance windows. The user base for an application may be spread across a country or around the globe. Your application needs to be able to respond in real time, even during peak hours. The exponential growth in data seen in most businesses makes serving this data up to users in a short turnaround time is a challenging task. To achieve low latency and high availability, you need to deploy instances in data centers that are both close to your user and close to your business.pgEdge has combined cutting edge technology, unique solutions, and deep PostgreSQL expertise to provide a solution. pgEdge is a fully distributed PostgreSQL database, optimized for the network edge, and deployable across multiple cloud regions or data centers. The solution is a true multi-master (active-active) distributed database system that allows read and write operations at any node on the network. It seems almost magical, providing:<ul><li>reduced data latency</li></ul><ul><li>high availability</li></ul><ul><li>targeted data residency</li></ul>and most importantly, an improved customer experience.The best part is you can get all of this, typically without any code changes. pgEdge allows both read and write operations to take place on any database node in a geographically distributed cluster.  Each node runs standard PostgreSQL (version 14, 15 or 16), and a cluster can span multiple cloud regions or data centers.  pgEdge nodes are loosely coupled, and are kept updated via asynchronous logical replication with conflict resolution.<h2>pgEdge Solutions</h2>Keeping the industry demand at the forefront, pgEdge offers fully-distributed multi-master PostgreSQL clustering solutions for both cloud (with pgEdge Cloud) and on-prem deployments (with pgEdge Platform).<h2>pgEdge Cloud high availability clusters</h2><a href="https://www.pgedge.com/products/pgedge-cloud"><u>pgEdge Cloud</u></a> is fully-distributed PostgreSQL, deployable across multiple cloud regions or data centers. The pgEdge Cloud console harnesses the low latency, high availability, and data residency benefits of pEdge distributed PostgreSQL in a fully managed cloud service running in multiple regions across AWS, Azure, or Google Cloud. pgEdge Cloud offers a free trial version that lets you experience a global, serverless PostgreSQL database in less than 90 seconds with powerful benefits and capabilities.  You can deploy a highly-available three-node active-active multi-master cluster that handles read/write operations with built in conflict resolution and:<ul><li>Low latency - achieve high performance with low latency by deploying read/write nodes in regions close to the user.</li></ul><ul><li>Edge integration - Providing integration with cloud flare workers and other edge platforms.</li></ul><ul><li>Rapid deployment - One click provisioning for global clusters on a secure private network.</li></ul><h2>pgEdge Platform high availability clusters</h2><a href="https://www.pgedge.com/products/pgedge-platform"><u>pgEdge Platform</u></a> is self-managed distributed PostgreSQL for developer evaluations or production use; use pgEdge Platform to self-host and self-manage pgEdge distributed PostgreSQL in your own data center or cloud account.Database nodes running pgEdge Platform can participate in clusters that span data centers and any of the major cloud providers( AWS, Azure and Google Cloud). pgEdge Platform runs on a variety of common hardware and OS combinations, and enterprise class support plans are available.<h2>Installing pgEdge Platform</h2>In any directory owned by your non-root user, install pgEdge on all nodes of the cluster:On each node of the cluster, move into the  directory and install pgedge, specifying a name for the database superuser, a password, and a database name. Note that the name cannot be the name of an OS user, pgedge, or any of the <a href="https://www.postgresql.org/docs/16/sql-keywords-appendix.html"><u>PostgreSQL reserved words</u></a>. You can also use the --port option to install PostgreSQL on a port other than the default port (5432).The command will download the required pgEdge components and verify the system prerequisites before installing the latest version of PostgreSQL 16 supported by pgEdge and configuring the server to support the pgEdge replication requirements. The server hosts a database (named ) with a database superuser (`admin`) that can log in to the database with the credentials specified (`mypassword1`).The command will also install the spock and snowflake extensions. The spock extension provides multi-master replication with conflict resolution. The snowflake extension provides support for sequences for multi-node multi-master clusters; regular PostgreSQL sequences are single host only.When executed, the command also creates a replication user with the same name as the OS user that invokes the command.  This is the user that you will use in connection strings when you create nodes and subscriptions.If you encounter a permissions error on EL9 running this command, you may need to update your SELINUX mode to  or , reboot, and retry the operation.<h2>Create Nodes</h2>Next you will register each of the databases as a spock node. Using node names with a naming sequence like n1, n2, n3 (.etc) will automatically set the correct value for snowflake.node, enabling the use of snowflake sequences. The user named in the connection string is a replication user, and has to match the OS user that invoked the setup command; in this example that user is named rocky.Node  (IP address 10.1.2.5):Node  (IP address 10.2.2.5):<h2>Create Subscriptions</h2>Next we need to create the subscriptions between the nodes in your cluster to support bi-directional replication. The connection string for sub_n1n2 should specify the connection details for n2 in the create node command; the string specified for sub_n2n1 should specify the connection details for n1 in the create node command. Again, you'll include the identity of the replication user (rocky) in the connection string.Node  (IP address 10.1.2.5):Node  (IP address 10.2.2.5):Our example is a simple two-node cluster; if you have a three-node cluster, the subscriptions should allow traffic between any node in each direction.  This means that for a three-node cluster you would create:<ul><li>sub_n1n2 between node 1 and node 2</li></ul><ul><li>sub_n1n3 between node 1 and node 3</li></ul><ul><li>sub_n2n1 between node 2 and node 1</li></ul><ul><li>sub_n2n3 between node 2 and node 3</li></ul><ul><li>sub_n3n1 between node 3 and node 1</li></ul><ul><li>sub_n3n2 between node 3 and node 2</li></ul>As your cluster grows, the subscriptions required also grow.<h2>Adding tables to the default Replication Set</h2>The next step is to use spock commands to add tables to the default replication set and start replication. The default replication set is created when you install pgEdge; you have the option to create a custom replication set and add it to the subscription, but using the default replication set provided simplifies configuration for our example. You also have the option of using spock to add all the tables in a schema to the replication set. The power of logical replication that underpins the pgEdge multi-master platform allows you to configure extremely granular replication.For this example, we'll use pgbench to add some tables. When you open pgbench or psql, specify your database name after the utility name.On each node, source the PostgreSQL environment variables to add pgbench and psql to your OS PATH; this will make it easier to move between the nodes:Then, use pgbench to set up a very simple four-table database. At the OS command line, (on each node of your replication set), create the pgbench tables in your database (demo) with the pgbench command. You must create the tables on each node in your replication cluster:Then, connect to each node with the psql client:Once connected, alter the numeric columns, setting  equal to . This will make these numeric fields conflict-free delta-apply columns, ensuring that the value replicated is the delta of the committed changes (the old value plus or minus any new value) to a given record:Then, exit psql:On the OS command line for each node, use the  command to add the tables to the system-created replication set (named ); the command is followed by your database name :The fourth table, , is excluded from the replication set because it does not have a primary key. The primary key is needed because the replication set is configured to replicate UPDATEs and/or DELETEs.<h2>Adding a Custom Replication Set to a Subscription</h2>Since we're using the default replication set (created by the pgEdge installer) we don't need to add the replication set to the subscription.  If you are using a custom replication set, it needs to be added to the subscription. The following spock command adds a replication set to the subscription.Please see the pgEdge documentation <a href="https://docs.pgedge.com/platform/installing_pgedge">https://docs.pgedge.com/platform/installing_pgedge </a>for detailed information on creating custom replication sets and adding or removing replication sets from a subscription.<h2>Useful Replication Status Views</h2>You can use spock functions and tables to check the replication status of your tables. The <a href="https://docs.pgedge.com/spock_ext/spock_info"><u>pgEdge documentation also provides a list of functions and tables available</u></a> for checking replication status and debugging issues.<h3>To check available subscriptions:</h3><h3>To check tables and their assigned replication set:</h3><h3>To check subscription status:</h3><h2>Conclusion: Postgres High Availability Clusters</h2>It is pretty clear that nearly every enterprise needs scalability to support its business needs and growing data requirements. PostgreSQL has done well in scaling upwards but in most cases it is proven that one machine is not enough to entertain application performance and high availability needs.PostgreSQL has several clustering offerings, both open source and proprietary, based on physical streaming replication and on logical replication. pgEdge has a unique and robust product, and has proved itself as a leader in PostgreSQL distributed multi-master replication. pgEdge Cloud offers a state-of-the-art and user-friendly cloud console that simplifies cluster management. The pgEdge Platform provides a true and robust multi-master distributed PostgreSQL solution. Conflict management and conflict avoidance capabilities are truly unique to pgEdge, and are instrumental in a multi-master logical replication environment. The product plans for pgEdge platform for 2024 are even more exciting. We are working on game changing logical replication capabilities that are increasingly in demand by enterprise applications. The upcoming features in pgEdge platform will continue to simplify ease of use and minimize adjustments needed to adopt multi-master replication for real world database applications. These features will include support for replication of DDL commands as-well as working with large objects. Above all of this the pgEdge team is working on increasing replication throughput across nodes.I will keep everyone posted on the above developments and will share information about our new features as they become available.Stay tuned…. </p> ]]></description>
            <guid>https://www.pgedge.com/blog/postgresql-clustering-solutions</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>pglogical,PostgreSQL,Distributed Postgres,PostgreSQL,pgEdge,Distributed Postgres</category>
            <title><![CDATA[Logical replication evolution in chronological order & clustering solution built around logical replication]]></title>
            <link>https://www.pgedge.com/blog/logical-replication-evolution-in-chronological-order-clustering-solution-built-around-logical-replication</link>
            <pubDate>Wed, 17 Jan 2024 05:42:54 GMT</pubDate>
            <description><![CDATA[ <p><h2>A brief history of PostgreSQL logical replication — and looking ahead at its likely future evolution</h2>This blog is divided into two parts. In this section, we walk through how the logical replication feature has evolved over the years, what the recent improvements for Postgres logical replication are, and how the feature will likely change in the future. The second blog of the series will discuss the multi-master (active-active), multi-region, and highly available PostgreSQL cluster created by pgEdge that is built on top of logical replication and pglogical.  Postgres replication is the process of copying data between systems. PostgreSQL supports two main methods of replication: logical replication and physical replication.  Physical replication copies the data exactly as it appears on the disk to each node in the cluster. Physical replication requires all nodes to use the same major version to accommodate on-disk changes between the major versions of PostgreSQL.Logical replication on the other hand is the method of replicating data based on data changes. The building blocks of the logical replication feature were introduced in PostgreSQL 9.4, however the feature was completed in PostgreSQL 10. Logical replication provides fine grained control over the replication set via a publisher/subscriber model where multiple subscribers can subscribe to one or more publishers. Logical replication uses logical decoding plugins that format the data so it can be interpreted by other systems. This makes replication possible among heterogeneous systems and across major PostgreSQL releases; this means it requires zero downtime for major version upgrades. Logical replication also provides fine grained control over the replication set so you can decide whether to replicate an entire table, only certain columns from a table, or all of the tables within a schema. <img src="https://a.storyblok.com/f/187930/960x720/7290c9cf29/picture2.webp"><h2>Postgres logical replication evolution in Chronological order </h2>As mentioned above, the community began developing the underlying technology that made logical replication possible in PostgreSQL 9.4. These features are the core building blocks for the logical replication feature.This section describes the main features for logical replication that were added in each release. To review a complete list of logical replication features for each release, please refer to the  section of each version of the release notes.This blog provides some context to the life cycle involved when building a major feature for PostgreSQL, and allows you to see how a feature matures over time. The basic logical replication feature was committed to PostgreSQL 10 however it required important patches in subsequent releases to make the feature performance feasible and usable. Logical replication is not finished yet; please read my thoughts in the final section on what might be on the roadmap for replication in the next set of releases.<h2>PostgreSQL 9.4 - 2014</h2><ul></ul><ul></ul><ul></ul><h2>PostgreSQL 9.5 - 2016 Jan</h2><ul></ul><ul></ul><h2>PostgreSQL 9.6 - 2016 Sep</h2><ul></ul><h2>PostgreSQL 10 - 2017</h2><ul></ul><h2>PostgreSQL 11 - 2018</h2><ul></ul><ul></ul><ul></ul><ul></ul><h2>PostgreSQL 12 - 2019</h2><ul></ul><h2>PostgreSQL 13 - 2020</h2><ul></ul><ul></ul><ul></ul><h2>PostgreSQL 14 - 2021</h2><ul></ul><ul></ul><h2> PostgreSQL 15 - 2022</h2><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul><ul></ul><h2>PostgreSQL 16 - 2023</h2><ul></ul><ul></ul><ul></ul><ul></ul><h2>PostgreSQL Logical Replication  - Looking ahead</h2>The building blocks for logical replication were added in PostgreSQL 9.4, but the logical replication feature was added in PostgreSQL 10. Since that release, there have been a number of important improvements to logical replication. The last two major releases of PostgreSQL have contributed to the performance and usability of logical replication with parallel application on the subscriber, allowing binary mode initial copy, supporting row/column based filtering, and more.Looking ahead at PostgreSQL 17 (and beyond) for logical replication, there is definitely a requirement for more performance improvement by increasing the replication rate and reducing the replication lag. I believe this can be achieved with parallelism support and worker optimization. There is also a need for better integration of logical replication with external tools for high availability and upgrades. The possibility of active-active (<a href="https://www.pgedge.com/solutions/benefit/multi-master">multi-master</a>) replication is also approachable as part of the PostgreSQL core, but it is missing major features like conflict detection and resolution. Some of the missing but important features are provided by pgEdge's Spock extension. pgEdge provides a fully <a href="https://www.pgedge.com/products/what-is-pgedge">distributed PostgreSQL</a> cluster that supports active-active replication with low latency, high availability, and data residency. Multi-master replication and the pgEdge clustering solution will be discussed in the next post of this series. </p> ]]></description>
            <guid>https://www.pgedge.com/blog/logical-replication-evolution-in-chronological-order-clustering-solution-built-around-logical-replication</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>pgEdge,PostgreSQL</category>
            <title><![CDATA[Embedding near the edge: pgEdge Distributed PostgreSQL with pgVector]]></title>
            <link>https://www.pgedge.com/blog/embedding-near-the-edge-pgedge-distributed-postgresql-with-pgvector</link>
            <pubDate>Wed, 20 Sep 2023 06:14:00 GMT</pubDate>
            <description><![CDATA[ <p><h3>Introduction</h3>We are excited to be announcing that we now support the increasingly popular pgVector Postgres extension for storing and searching vector embeddings in AI-powered applications. Bringing pgVector and pgEdge’s distributed capabilities together makes for a powerful combination that greatly improves performance for users regardless of their geographic location.In this blog we'll demonstrate how to configure pgVector with pgEdge to provide similarity search functionality across a pgEdge Distributed PostgreSQL cluster. <img src="https://a.storyblok.com/f/187930/1000x335/25f8a887d8/pgvector_diagram-1.webp" >I will start with brief summary of the products mentioned in the title of this blog:      pgEdge is fully-distributed PostgreSQL, optimized for the network edge and deployable across multiple cloud regions or data centers. pgEdge is available as pgEdge Platform, self-hosted software available for download from [download link]; or as pgEdge Cloud, a fully managed service. This blog is applicable to both pgEdge Cloud and pdEdge Platform.pgvector is an open source extension for PostgreSQL that enables efficient similarity search and other vector-based operations. It's often used for applications like recommendation systems and image search. The pgvector extension provides an indexable vector data type that stores vectors in a PostgreSQL database. pgvector supports the  index, which implements the  method of indexing.<h3>Vector Database</h3>Vector data stores data as high-dimensional vectors, which are mathematical representations of features or attributes. The number of dimensions in a vector ranges from tens to thousands, depending on the complexity and granularity of the data. The main advantage of a vector database is that it allows for fast and accurate similarity search and retrieval of data based on their vector distance or similarity. So instead of using the conventional methods for searching data using predefined criteria or exact matches or wildcards, one can use the vector database to find similar or relevant data based on semantic or contextual meaning.Vector databases enable accurate and efficient search and analysis of large datasets by utilizing the characteristics of vectors. A vector database's capacity to locate comparable items is its key benefit. For example, two statements with comparable meanings will produce vectors that are close to one another. This allows you to use the vector database to locate all the vectors that are near to one another. For example, a vector database can be used to find:<ul><li>images that are similar to a given image based on visual content and style.</li></ul><ul><li> </li><li>documents that are similar to a given document based on topic and content.</li></ul><ul><li>products that are similar to a given product based on features and ratings.</li></ul>Vector databases are currently the popular choice. With the rise of large-language AI models (LLMs), efficiently managing and searching large-scale, high-dimensional data has become a tremendously important use case. The solution to this challenge lies in vector databases – a powerful and increasingly popular data storage technology that enables faster and more accurate searches.With the addition of the open-source pgvector extension, PostgreSQL is being used as a vector database. There is a lot of excitement about using PostgreSQL as a vector database, but there is more innovation to come, and work to be done to make the vector workload more secure, performant, and scalable.<h3>Vector Data</h3>Before showing an example of how pgEdge works with pgvector extension, it is important to understand the dynamics of vector data, and how it is stored in the database.  Vector data refers to a type of data representation where each data point is described by a set of numerical values arranged in a specific order. These values are usually referred to as components or features and they capture different aspects or attributes of a data point. Vectors are commonly used to represent a wide range of information in many fields: mathematics, computer science, data science, and machine learning.Real-world applications utilize far more than just two dimensions; OpenAI embeddings may use more than a thousand dimensions to vectorize data. One method for converting high-dimensional data into a low-dimensional space is embedding. Embedding allows us to extract data from multiple dimensions and sources, including text, photos, audio, and video, and convert it into vectors. Embedding is a widely-used technique in machine learning and natural language processing (NLP) to represent sparse symbols or objects as continuous vectors.For example, tree data like a car, truck, cycle, helicopter, or hoverboard object may all be converted into vectors using embeddings. Two-dimensional embeddings are shown behind the object they describe in the following list:<ul><li>car: embedding [2.0,2.3]</li></ul><ul><li>truck: embedding [3.4, 5.9]</li></ul><ul><li>motorcycle: embedding [0.5,1.2]</li></ul><ul><li>bicycle: embedding [0.2,0.8]</li></ul><ul><li>helicopter : embedding [13.2,19.8]</li></ul><ul><li>hoverboard: embedding [0.1,0.2]</li></ul>A review of the result set shows us that a bicycle and a motorcycle are similar and that their vectors (if charted) would be fairly close in distance. Vehicle characteristics can also be categorized along dimensions that include color, model, year, and manufacturer. The finer-grained your data is when describing an object, the more precise your results will be in the resulting vehicle grouping.Vector databases can efficiently find items that satisfy a query using vector representations. They use similarity metrics like <a href="https://en.wikipedia.org/wiki/Euclidean_distance"><u>Euclidean</u></a> distance, <a href="https://en.wikipedia.org/wiki/Cosine_similarity"><u>Cosine</u></a> similarity, or <a href="https://xlinux.nist.gov/dads/HTML/manhattanDistance.html"><u>Manhattan</u></a> distance to determine data point proximity, resulting in relevant and similar results.<h3>pgvector syntax</h3>The pgvector extension introduces a vector data type that can be used as the column type in a PostgreSQL database. The simple examples that follow show how to use the vector data type in  statements, and search the vector data. Invoke the following commands with the psql client:Creating a Sample TableRetrieving DataManaging Data OperationsQuerying AggregatesCreating IndexesPostgreSQL can create indexes for vectors that hold up to 2000 dimensions.You can create embeddings using tools like the OpenAI API client. Similarity searches of vector embeddings have a variety of commercial uses like fraud detection, food industry use, security systems.<h3>pgvector real world example</h3>The following example is a real world sample code of an AI based enquiry system that tries to automatically answer client queries. It has a limited knowledge base, if it doesn't know the answer, it replies appropriately.This generates the following log:The above sample code elaborates the use of pgvector extension for a real world example of AI based enquiry system that tries to automatically answer client queries. We can divide the application into four sections:<ul><li>Questions mimic client queries to drive learning. Since it is an intelligent automatic reply enquiry system, we have fed all the client queries in </li><li>QUERIES</li><li> array.</li></ul><ul><li>The system has a knowledge base that contains all the information that we want the system to learn. The knowledge base grows.</li></ul><ul><li>We perform a similarity search that exercises pgvector/PostgreSQL capabilities. We iterate and get responses from the system for each query.</li></ul><ul><li>We generate a response from an AI model. Since we have a limited knowledge base, if our AI model doesn't know the answer, it replies accordingly. We expect that it will reply to all the related queries.</li></ul><h3>Exercising the example</h3>Query:The knowledge base contains the following entry to educate the automatic system to answer correctly:Enquiry System Response:The system has capability to do similarity search to correctly answer the posted query by the client. This is possible with the help of the PostgreSQL pgvector extension and the OpenAI embedding generation feature. When we use PostgreSQL with pgvector, not only does it provide vector search, but it helps with storage and other RDBMS features that help us develop a professional and industrial quality application.To generate a good reasonable response to the client, we used the OpenAI model () to generate an answer to the query. If the knowledge base provides no related knowledge, it will reply with This application is written in basic python code to demonstrate the real world use of the pgvector extension. It was tested with PostgreSQL 15 (with pgvector extension installed), OpenAI (via online internet access), and Python 3.9.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/embedding-near-the-edge-pgedge-distributed-postgresql-with-pgvector</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>PostgreSQL</category>
            <title><![CDATA[PostgreSQL 16 Logical Replication Improvements in Action]]></title>
            <link>https://www.pgedge.com/blog/postgresql-16-logical-replication-improvements-in-action</link>
            <pubDate>Wed, 02 Aug 2023 13:02:55 GMT</pubDate>
            <description><![CDATA[ <p>In my previous blog, we started discussing this topic: <a href="https://www.pgedge.com/blog/postgresql-replication-and-upcoming-logical-replication-improvements-in-postgresql-16"><u>https://www.pgedge.com/blog/postgresql-replication-and-upcoming-logical-replication-improvements-in-postgresql-16</u></a>I briefly discussed replication methods in PostgreSQL, and provided a summary of some of the key features of logical replication that made it in PostgreSQL 16. In this blog, I will dive deep into a couple of performance features for logical replication, demonstrate the steps for seeing the features in action, and share the results of performance benchmarking.The blog will focus on the parallel apply and binary copy features in PostgreSQL 16. The parallel apply feature enables the functionality of using parallel background workers at the subscriber node for apply change for large in-progress transactions. The number of parallel workers to use for applying changes from the publisher is . The second performance feature is binary copy. This feature allows logical replication to do the initial data copy in binary format. This provides a good performance boost when copying tables with binary columns.     <h2>Parallel Apply</h2>Parallel apply is a performance feature that provides performance benefits for replicating large in-progress transactions. To achieve this, we start the changes streaming to the subscriber node, and then use parallel background workers at the subscriber node to apply the changes while they are being streamed from the publisher. You can configure the number of parallel workers to use at the subscriber node for applying the changes with the configuration parameter.The example below demonstrates how to use this exciting logical replication feature.  We've also provided sample performance numbers taken while running a test with a couple of AWS instances in different regions.For this example, I have the publisher running on AWS us-east-1 and subscriber node running on AWS us-west-2. <h3>Publisher</h3>To configure the publisher node, connect to the node and:1. Create a fresh PostgreSQL cluster with  and set the following configuration parameters. Specify values that work well with your server specification:  2. Create a table for publication; we've used the following command:3. Create a publication ; you can optionally create a publication for just the large_test table created in the previous step:<h3>Subscriber</h3>To configure the subscriber node, connect to the node and:1. Create a fresh cluster with  and set the following configuration parameters. The parameters need to be set according to your server specification:For our test server, I set  to  to spawn four parallel workers for applying changes to the subscriber node.2. Create a table for publication to receive the replication stream from the publisher:3. Create a subscription with connection properties to the publisher:Please note that we are setting the  parameter to  for the purposes of this test so we can stream the table changes instead of doing the initial data copy. We are also setting the streaming type to ; this will enable the parallel apply feature and apply the changes to the subscriber node with the specified number of workers. <h3>Publisher</h3>To set up our test scenario, we connect to the publisher node and:1. Set  to the name of the subscriber; you don't need to do this to make use of the parallel apply feature; this was only done for the purpose of this test. Setting the parameter ensures that the backend waits for the application on the subscriber node, so we can measure the timing:2. Restart the PostgreSQL server.3. Use psql to run the following command. The command starts and times a large transaction on the publisher node:<h3>Results</h3><ul><li>With streaming set to </li><li>parallel</li><li>, it takes</li><li> 58887.540 ms (00:58.888) </li><li>to complete the transaction and apply the changes at the subscriber node.</li></ul><ul><li>With streaming set to </li><li>off</li><li>, it took </li><li> 106909.268 ms (01:46.909) </li><li>to complete the transaction and apply the changes at the subscriber node.</li></ul>This gives us up to 50-60% performance gain for large in-progress transactions using parallel apply. <h2>Binary Copy</h2>Binary copy is another performance feature of logical replication added in PostgreSQL 16. The binary copy feature makes it possible to do the initial copy of table data in binary format. Streaming data in binary format was added in previous releases but doing the initial table copy in binary mode wasn’t supported prior to PostgreSQL 16.I've conducted a test using two AWS instances to demonstrate the performance benefit gained with this feature. The following example shows how to enable this feature and provides the performance numbers of testing the initial data load with binary vs non-binary format. <h3>Publisher</h3>To set up our binary copy test scenario, connect to the publisher node and:1. Set the following configuration parameters to maximize your system performance:2. Create a table that includes  columns:3. Create a publication, specifying the FOR ALL TABLES clause:4. Add records to the table:5. Check the table size after the initial data load:<h3>Subscriber</h3>Connect to the subscriber node and:1. Set the following configuration parameters appropriately for your system:2. Create a table with the same bytea columns:3. Create the subscription; set the  parameter to  and the  parameter to  for the initial data transfer.4. Create the following function to time the initial data copy from publisher to subscriber:5. Call the function to time the transfer:<h3>Results</h3><ul><li>Without binary load (</li><li>binary</li><li> set to </li><li>false</li><li>), it took</li><li> 383884.913 ms (06:23.885) </li><li>to complete the transaction and apply the changes at the subscriber node.</li></ul><ul><li>With binary load (</li><li>binary</li><li> set to </li><li>true</li><li>), it took</li><li> 267149.655 ms (04:27.150)</li><li> </li><li>to complete the transaction and apply the changes at the subscriber node.</li></ul>This provides a 32% performance gain when performing the initial table copy in binary format. <h2>Conclusion</h2>The use of distributed PostgreSQL databases is growing rapidly, and replication is a vital and core part of any distributed system. Replication features in PostgreSQL are evolving to become more mature and feature rich with every major release. The groundwork for logical replication was laid prior to PostgreSQL 10, but the logical replication feature itself developed into a usable form in PostgreSQL 10. Since then, replication support has grown tremendously, and the major features added in each release warrant a separate blog post that I will cover in due course. This blog covers new logical replication performance features added in PostgreSQL 16; stay tuned for more blogs discussing the remaining PostgreSQL 16 logical replication features.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/postgresql-16-logical-replication-improvements-in-action</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>
            <item>
            <category>Distributed Postgres,Multi-Master (Multi-Active),PostgreSQL</category>
            <title><![CDATA[PostgreSQL Replication and upcoming Logical Replication Improvements in PostgreSQL 16]]></title>
            <link>https://www.pgedge.com/blog/postgresql-replication-and-upcoming-logical-replication-improvements-in-postgresql-16</link>
            <pubDate>Tue, 02 May 2023 17:40:22 GMT</pubDate>
            <description><![CDATA[ <p>Replication is a process that reliably copies data from one database server to another database server in an automated fashion. Replication is a core part of an enterprise database solution that:<ul><li>offers fault tolerance in-case of data mishaps</li></ul><ul><li>enables high availability in the event of a node failure</li></ul><ul><li>allows incoming traffic to be distributed across replicas for provide better performance</li></ul><ul><li>… and more.</li></ul>This blog is the first of a series discussing the future of logical replication. In this post, I’ll focus on the improvements the community has added to logical replication for PostgreSQL 16. The next post will describe the in-flight PostgreSQL 16 logical replication improvements (those changes that are in progress, but not yet committed). The last post in the series will delve into a new PostgreSQL extension for logical replication called <a href="https://github.com/pgEdge/spock"><u>Spock</u></a>. Spock is a replication solution recently released by <a href="https://www.pgedge.com/"><u>pgEdge</u></a> that leverages both the <a href="https://github.com/2ndQuadrant/pglogical"><u>pgLogical</u></a> and <a href="https://github.com/2ndQuadrant/bdr/tree/REL0_9_94b2"><u>BDR2</u></a> open-source projects as a solid foundation for this enterprise-class extension. Please visit our official <a href="https://www.pgedge.com/company"><u>site</u></a> to learn more about pgEdge and Spock.Spock provides <a href="https://www.pgedge.com/solutions/benefit/multi-master">multi-master</a> (multi-active) PostgreSQL replication optimized for the network edge of cloud-based systems (with the cloud provider of your choice) or for databases hosted on-prem. With its logical replication foundation, Spock offers fine-grained control for your data replication and security needs.<h2>PostgreSQL Replication Methods</h2>PostgreSQL supports two native methods of replication: logical replication and physical replication (also called streaming replication).<a href="https://www.postgresql.org/docs/15/protocol-logical-replication.html"><u>Logical replication</u></a> uses a publisher/subscriber model to replicate changes between PostgreSQL servers. The primary node (where the database lives) is called the publisher, and the stand-by node (which receives copies of database transactions) is called the subscriber. Database changes are copied from the publisher node to one or more subscriber node(s) identified by the subscription.When you set up logical replication, you take a snapshot of the data on the published database, and copy it to the subscriber. When you start the subscription, changes on the publisher are sent to the subscriber as they occur. Logical replication uses a transactional model to apply changes to the subscriber in the same order that they are applied to the publisher. This guarantees transactional consistency.The other native method of PostgreSQL replication is <a href="https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION"><u>physical (or streaming) replication</u></a>. Streaming replication passes the data from the primary node to the stand-by node in WAL (write-ahead log) files. You can configure streaming replication to be either synchronous or asynchronous; by default, streaming replication is asynchronous.<ul><li>Asynchronous replication ships each log file to the stand-by node after the transaction is committed on the primary server. If something happens to the primary server before the transaction is written to the stand-by, you can potentially lose data.</li></ul><ul><li><a href="https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION">Synchronous</a></li><li><a href="https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION"> replication writes each WAL record to the primary and stand-by node simultaneously. It is generally safer, but requires a more robust network connection with better bandwidth.</a></li></ul>Both synchronous and asynchronous modes of streaming have their own pros and cons. As a rule, synchronous replication offers better data protection in the event of a server problem, while asynchronous replication is more cost effective in terms of required resources. Review the <a href="https://www.postgresql.org/docs/15/high-availability.html"><u>PostgreSQL documentation</u></a> for more information about native replication methods.<h2>Logical Replication Improvements in PostgreSQL 16</h2>Let’s turn our attention to the main topic of this blog, and summarize the key logical replication improvements that are added to PostgreSQL 16 so far.<h3>Applying changes to the subscriber with background workers</h3>Currently, the changes for large, in-progress transactions are sent from the publisher to subscriber in multiple streams, with the changes divided into chunks based on the value of the <a href="https://www.postgresql.org/docs/15/logicaldecoding-streaming.html"><u>logical_decoding_work_mem</u></a> parameter. PostgreSQL version 16 adds a feature that improves performance by parallelizing the process of applying changes to the subscriber node by using multiple background workers.The parallel application to the stand-by node begins while the transaction is still in-progress on the primary node. When the application starts, a single worker applies the top-level transaction, while parallel workers begin to apply the sub-transactions. If any of the parallel workers error out, the entire transaction is exited. This functionality provides transactional consistency to ensure that a partially completed bulk insert does not remain in your database.Performance benchmarking shows that the patch offers a 30 to 40% performance improvement for bulk inserts. You can review the benchmarking as part of the patch history at <a href="https://commitfest.postgresql.org/42/3621/">https://commitfest.postgresql.org/42/3621</a>.<h3>Creating a subscription in binary format</h3>In PostgreSQL version 16, when you <a href="https://www.postgresql.org/docs/15/sql-createsubscription.html"><u>create a subscription</u></a>, you have the option to use binary format for the initial data transfer. Prior to version 16, the initial sync was performed in text format; you could change the format to binary only after logical replication was started. This new functionality allows you to perform the initial sync in the same format that you plan to use for replication.The <a href="https://www.postgresql.org/docs/15/sql-copy.html"><u>COPY</u></a> command is used behind the scenes of the <a href="https://www.postgresql.org/docs/15/sql-createsubscription.html"><u>CREATE SUBSCRIPTION</u></a> command to copy the data for the initial sync. Since the COPY command supports both binary and text formats, it makes perfect sense to support both. You can use the following clauses to specify the data transfer mode:<ul><li>When you set </li><li>binary=false </li><li>(the default), data is sent in </li><li>text </li><li>format.</li></ul><ul><li>When you set </li><li>binary=true</li><li>, data is sent in </li><li>binary </li><li>format.</li></ul>If your column type supports binary, copying tables in binary format may reduce your initial sync time.Note that this feature is supported only when both the publisher and subscriber are version 16 or later. Please review the commit fest entry for more details <a href="https://commitfest.postgresql.org/42/3840/"><u>https://commitfest.postgresql.org/42/3840/</u></a>.<h3>Improving performance by using indexes on the subscription node</h3>The <a href="https://www.postgresql.org/docs/15/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY"><u>REPLICA IDENTITY</u></a> attribute helps the server identify the correct row on the subscriber node to UPDATE or DELETE when a change occurs to the primary node. If your table does not have a key, specifying REPLICA IDENTITY FULL tells the server to use a combination of all of the columns in a row to identify the correct row on the subscriber to modify.Specifying REPLICA IDENTITY FULL on the publication node, can trigger a full table scan on the subscriber node in the event of an UPDATE or DELETE to ensure that the correct row is updated. A full table scan can be time-consuming, and uses more resources than an index.This commit improves performance by allowing you to specify which index will be used on the subscriber when applying UPDATES and DELETES. The index must be:<ul><li>a btree index</li></ul><ul><li>a non-partial index</li></ul><ul><li>include at least one column that does not consist solely of expressions</li></ul>If multiple indexes meet these requirements, the server will select the first valid index, instead of using a smart approach to select the best index. If you specify a REPLICA IDENTITY other than FULL, the subscriber must have a similar replica identity.The functionality provided by this feature is only enabled when REPLICA IDENTITY FULL is specified. The functionality is skipped when the remote relation doesn’t contain the left most column of the index, primarily because a sequential scan provides better performance in such cases.  Please see the commit fest entry for more details <a href="https://commitfest.postgresql.org/42/3765/"><u>https://commitfest.postgresql.org/42/3765/</u></a><h3>Allow logical decoding on stand-by</h3><br>Prior to PostgreSQL 16, logical decoding was supported only for the primary node; this commit allows minimal logical decoding on the stand-by node as well. To make use of this functionality, you need to set <a href="https://www.postgresql.org/docs/15/runtime-config-wal.html"><u>wal_level</u></a> higher than replica (the default) on the primary node.This feature allows you to:<ul><li>create a logical replication slot on a stand-by node</li></ul><ul><li>create a subscription to a stand-by node</li></ul><ul><li>perform logical decoding on the stand-by node</li></ul>Prior to this commit, those actions would result in the following error:logical decoding cannot be used while in recoveryThis commit also introduces the pg_log_standby_snapshot() function. The function takes a snapshot of a running transaction, and writes it into WAL files without requiring a checkpoint. This function makes the process of creating logical replication slots on a stand-by much faster; the function helps create the replication slot on the stand-by if the primary node is in an idle state.For more information, please see the commit fest entry at: <a href="https://commitfest.postgresql.org/42/3740/"><u>https://commitfest.postgresql.org/42/3740/</u></a>.<h2>Conclusion</h2>PostgreSQL logical replication continues to improve and become more robust. Some of the features added in this release also lay the groundwork for more great features in future releases. This post summarizes some of the key logical replication features added to PostgreSQL 16. My next post will go over the improvements that are in progress and discuss the likelihood of those making it into the release.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/postgresql-replication-and-upcoming-logical-replication-improvements-in-postgresql-16</guid>
            <author><name>Ahsan Hadi</name></author>
            </item>    
    
        </channel>
    </rss>