<?xml version="1.0" encoding="UTF-8" ?>
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
        <channel>
            <title>pgEdge Posts from Muhammad Aqeel</title>
            <link>https://www.pgedge.com/blog</link>
            <description>The latest pgEdge Posts from Muhammad Aqeel</description>
            <atom:link href="https://www.pgedge.com/feeds/rss/user/muhammad-aqeel/postgresql.xml" rel="self" type="application/rss+xml" />
            <language>en-us</language>         
            
            <item>
            <category>PostgreSQL</category>
            <title><![CDATA[Volatile Queries and Semantic Caching: How to Make Sure It Always Returns the Right Answer]]></title>
            <link>https://www.pgedge.com/blog/volatile-queries-and-semantic-caching-how-to-make-sure-it-always-returns-the-right-answer</link>
            <pubDate>Thu, 30 Apr 2026 05:47:19 GMT</pubDate>
            <description><![CDATA[ <p>Part 3 of the Semantic Caching in PostgreSQL series. <a href="https://www.pgedge.com/blog/semantic-caching-in-postgresql-a-hands-on-guide-to-pg_semantic_cache"><u>Part 1</u></a> covers the fundamentals of  — how it stores query embeddings, runs cosine similarity searches via pgvector, and returns cached LLM results without a round-trip to your model provider. <a href="https://www.pgedge.com/blog/pg_semantic_cache-in-production-tags-eviction-monitoring-and-python-integration"><u>Part 2</u></a> goes deeper into production operations: cache tags, eviction policies, monitoring, and Python integration patterns. This post focuses on a specific class of queries that need to be handled differently, and where that handling belongs.A well-tuned semantic cache can deliver 60–80% fewer LLM API calls and matching cost savings. But those numbers depend on caching the right queries. Cache everything and you risk returning answers that were accurate once but are no longer true — and returning them confidently, with no indication that anything is wrong. Understanding the line between cacheable and non-cacheable queries, and owning that line in the right layer of your stack, is what separates a semantic cache that saves money from one that quietly misleads users.<h2>The Two Kinds of Queries Your Cache Will See</h2>Every query that arrives at your application falls into one of two buckets.Time-invariant queries have answers that do not depend on when they are asked. "What is the boiling point of water?" is the same answer today as it was last year and will be next year. "Explain how TCP/IP works." "What does idempotent mean?" Semantic caching is a natural fit for these — one LLM call populates the entry, and every paraphrase that follows is a free hit.Volatile queries have answers that are bound to the moment they are asked. Their correct response changes with time, live state, or the specific user asking:The defining property of volatile queries is that they produce stable embeddings but changing answers. Ask "What is the current time?" at 14:00 and again at 14:05 and the two vectors have cosine similarity of 1.0 — the same sentence, the same semantics, an identical embedding. But one correct answer is  and the other is . A cache that stores the first call and serves it for subsequent matches will confidently return the wrong answer forever — and the similarity score will be a perfect 1.0 each time, giving no indication that anything is stale.Now add paraphrases into the mix. "What time is it right now?" and "Can you tell me the current time?" and "What's the time?" all map to nearby positions in vector space — similarity scores above 0.90 are typical. Every one of them should reach the LLM directly, never the cache.<h2>The Cache Is Doing Its Job Correctly</h2>Here is the important framing: when  returns a stored result for a volatile query, it is not making a mistake. The vector geometry is accurate. The similarity score is legitimate. The cache found the closest matching entry above the configured threshold and returned it. That is precisely what a semantic cache is designed to do.The issue is not the cache's behavior — it is the assumption that every query should go through the cache in the first place. That assumption is the application's to make, not the cache's.Asking  to distinguish volatile from non-volatile queries would require it to understand the relationship between a query's meaning and time. It would need to know that "current time" implies a result that expires in seconds, while "speed of light" is valid indefinitely. That is semantic reasoning about a query's temporal context — something the application layer possesses and the cache does not. stores vectors, finds similar vectors, and returns results. Deciding which queries are worth caching is the application's job, and it turns out the application is perfectly equipped for it.<h2>Why Query Classification Belongs in the Application</h2>Well-architected systems assign each responsibility to the layer best equipped to fulfill it. Let's map that to a semantic caching stack:The application layer has context that neither the embedding model nor the cache possesses.It knows the domain. A financial trading application knows that stock prices are real-time data. A customer support bot knows that "what is my order status?" is user-specific and non-cacheable. A weather service knows that any query mentioning "right now" or "currently" needs a live answer. The cache knows none of this — and it shouldn't need to.It knows the user's intent. The application can inspect query metadata, session context, feature flags, or user-provided parameters to make routing decisions that go well beyond what any pattern in the query text can express.It controls the flow. The application already decides when to call the embedding model, when to query the cache, and when to invoke the LLM. Volatile detection sits naturally in this orchestration logic. Adding one classification step before the cache lookup costs microseconds and eliminates an entire category of incorrect behavior.Embedding this classification logic into the cache itself would couple a general-purpose infrastructure component to domain-specific knowledge it has no business knowing. Your cache should be reusable across applications. Your routing logic is specific to yours.<h2>The Decision Flow</h2>Once the application takes ownership of query classification, the pipeline becomes clean and explicit. Volatile queries are intercepted before any embedding is computed or any database round-trip is made: <img src="https://a.storyblok.com/f/187930/725x971/e5c5c88657/picture1.png" >Each layer does what it does best. The application classifies. The cache stores and retrieves. Neither is asked to do the other's job.<h2>Building the Classifier</h2><h3>Regex: the fast path for known domains</h3>For most applications, the set of volatile query patterns is knowable in advance. A regex classifier handles them with negligible overhead — well under a microsecond per query, zero external calls:Start with this. It is deterministic, auditable, and fast.<h3>LLM classifier: coverage for open-ended inputs</h3>If your application accepts free-form user input and you expect volatile patterns you haven't anticipated, a lightweight LLM pre-classifier catches what the regex misses:In practice: use the regex classifier in production, use the LLM classifier during development to discover new volatile patterns, and fold those discoveries into the regex list over time.<h2>The Complete Python Integration</h2>Now let's put it together. Here is the full pipeline using 's SQL functions. The application owns steps 1 and 2. The extension owns steps 3 and 4.The volatile check in step 1 is the only difference from a naive cache-everything approach. One function call, zero database round-trips, complete protection against stale volatile answers.<h2>Seeing It in Action</h2>The  repository includes a complete runnable example at  that demonstrates all three outcomes — volatile bypass, cache miss, and cache hit — using real LLM calls.The stack:<ul><li>PostgreSQL 18 via pgedge packages, with </li><li>pg_semantic_cache</li><li> built from source</li></ul><ul></ul><ul><li>Ollama (</li><li>llama3.2:1b</li><li>) for LLM generation — no API key required</li></ul><h3>Volatile queries: LLM called directly, cache never touched</h3> and  were never called. The cache was not involved — which is exactly correct.<h3>Non-volatile queries: first ask hits the LLM, result stored</h3><h3>Semantically similar rephrases: cache returns the stored answer</h3><h3>End-to-end statistics</h3>The 55% hit rate is a cold-start result — each stable question was asked once before its rephrased equivalent arrived. In a production system with a warm cache and a large question population, the hit rate climbs substantially. The important number is the volatile count: 6 queries that would have returned wrong cached answers instead reached the LLM and returned correct live responses.<h2>How the Rest of the Ecosystem Handles This</h2>This is not a design choice unique to . Surveying the major semantic caching products across the LLM ecosystem reveals a consistent pattern: not one of them performs automatic volatile detection internally. Every product delegates this decision to the application.<h3>GPTCache</h3><a href="https://github.com/zilliztech/GPTCache"><u>GPTCache</u></a> (Zilliz) is the most feature-rich open-source semantic cache for LLMs, and it goes furthest in providing integration points for volatile handling. It offers three mechanisms.Per-request skip flag — pass  to bypass both lookup and storage for a single call:Global pre-processing hook —  in  accepts a callable that inspects every request and returns  to skip all cache logic. The function receives the raw LLM call arguments, so you extract the query text from wherever it appears in those args:Temperature-based bypass — GPTCache intercepts the  parameter and uses it probabilistically to control cache behavior. At  the cache is always skipped; at  it is always consulted. This lets callers signal volatility through an existing LLM parameter with no extra API surface.Notice what GPTCache is doing: it is building clean hooks into which the application plugs its own logic. The classification itself — which queries are volatile — is still entirely the developer's responsibility. The hooks just make it easier to act on that classification at the right point in the lifecycle.<h3>LangChain Semantic Cache</h3>LangChain's semantic cache backends (, , ) expose no skip mechanism at all. The  interface — which defines , , , and their async counterparts — contains no hook, per-request flag, or bypass parameter for volatile queries.The community workaround is the same explicit application-layer conditional:A feature request for built-in per-query cache bypass has been open in the LangChain repository for some time — confirming this is a recognized gap that the library itself has not addressed. The application-layer conditional is considered the correct pattern in the meantime.<h3>Redis (RedisVL / LangCache)</h3><a href="https://github.com/RedisVentures/redisvl"><u>RedisVL</u></a>'s  and Redis's managed <a href="https://redis.io/docs/latest/develop/ai/langcache/"><u>LangCache</u></a> service both support per-entry TTL overrides as the closest they come to volatile-aware handling:Redis's own engineering blog recommends adaptive TTL values: 15–30 minutes for stock prices and weather, longer for stable facts. This is good advice — but TTL-based expiry reduces the window of staleness rather than eliminating it. A cached stock price from 14 minutes ago is still wrong. For queries where any stale answer is unacceptable, Redis's documentation also recommends an application-layer check before calling .<h3>Microsoft Semantic Kernel</h3>Semantic Kernel implements semantic caching through its filter middleware system. The  interface fires before the LLM is called and is the designed extension point for routing decisions. In Semantic Kernel's caching model, setting  to a non-null value tells the framework to skip the LLM call and return that value directly — so a caching filter stores and retrieves results by managing . To bypass the cache for volatile queries, the filter simply calls  without touching , leaving the LLM call to proceed normally:The official Semantic Kernel caching sample () caches all prompts uniformly — no volatile detection included. The filter interface exists precisely so the application team can add that logic. The framework provides the hook; the application provides the classifier.<h3>LlamaIndex</h3>LlamaIndex's  is hash-based (keyed on node content), not time-aware. Its LLM response caching historically delegated to LangChain's global cache. There is no native skip-cache API in core LlamaIndex. A GitHub issue requesting per-query cache bypass was closed without resolution, with the application-layer conditional documented as the intended approach.<h3>Walmart's waLLMartCache: A Production-Scale Reference</h3>The most concrete published architecture for this problem at scale comes from Walmart's internal waLLMartCache system, described in a 2024 ICPR paper. For a high-throughput e-commerce search and pricing platform, Walmart built an explicit Decision Engine with a dedicated Temporal Context Detection module. Its job is to intercept every query before any cache operation and classify it as static (cacheable) or dynamic (volatile). Dynamic queries — live inventory lookups, real-time pricing, time-sensitive promotions — are routed directly to the LLM/RAG pipeline, bypassing the cache entirely.This is not an open-source library. It is a custom production system. But it is the most detailed published example of this architectural pattern operating at e-commerce scale, and it makes the same design choice: volatile detection is a pre-cache routing layer built by the application team, not a feature embedded in the cache infrastructure.<h3>What the Ecosystem Agrees On</h3>Across open-source libraries, managed services, and internal production systems, the pattern is identical:The reason is the same everywhere: the cache operates on vectors, and volatility is a property of the query's relationship to time. No vector encodes the fact that its source question requires a live answer. The application — which knows the domain, knows the user, and controls the query routing logic — is the right layer to make that call. The entire ecosystem is built around this expectation.<h2>Putting It All Together</h2>Volatile query handling is not a caching problem. It is a query routing problem, and query routing is the application's domain. Here is the complete picture: handles the third row with precision and efficiency — that is its job, and it does it well. The application handles the first two by classifying queries before they reach the cache. Each layer does what it is designed to do.Adding a volatile classifier to your application is a small change. One regex match per query. Zero database round-trips on the volatile path. The result is a semantic cache that is always correct for the queries it handles, because the application has already ensured that only the right queries reach it.Try it yourself. A complete Docker example is available at <a href="https://github.com/pgedge/pg_semantic_cache/tree/main/examples/volatile_query_detection"><u>examples/volatile_query_detection/</u></a> in the  repository. It runs PostgreSQL 18,  built from source, and Ollama locally — no API keys required.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/volatile-queries-and-semantic-caching-how-to-make-sure-it-always-returns-the-right-answer</guid>
            <author><name>Muhammad Aqeel</name></author>
            </item>
            <item>
            <category>PostgreSQL,pgEdge,PostgreSQL</category>
            <title><![CDATA[pg_semantic_cache in Production: Tags, Eviction, Monitoring, and Python Integration]]></title>
            <link>https://www.pgedge.com/blog/pg_semantic_cache-in-production-tags-eviction-monitoring-and-python-integration</link>
            <pubDate>Tue, 03 Mar 2026 04:20:12 GMT</pubDate>
            <description><![CDATA[ <p>Part 2 of the Semantic Caching in PostgreSQL series that’ll take you from a working demo to a production-ready system.<h2>From Demo to Production</h2>In <a href="/blog/semantic-caching-in-postgresql-a-hands-on-guide-to-pg_semantic_cache"><u>Part 1</u></a>, we set up pg_semantic_cache in a Docker container and demonstrated how semantic similarity matching works. In summary, semantic caching associates a string with each query that allows us to search the cache by meaning instead of by the exact query text. We demonstrated a cache hit at 99.9% similarity and a cache miss at 68% (configurable defaults), and discussed why this can make a difference for LLM-powered applications.Now let's make it production-ready. A cache that can store and retrieve is useful, but a cache you can organize, monitor, evict, and integrate into your application is what you actually need for a production deployment. In this post, we'll cover all of that.We'll continue using the same Docker environment from <a href="/blog/semantic-caching-in-postgresql-a-hands-on-guide-to-pg_semantic_cache"><u>Part 1</u></a>. If you need to set it up again, refer to the Dockerfile and setup instructions in that post.<h2>Organizing with Tags</h2>Tags let you group and manage cache entries by category; the  at the end of the cache entry contains the tags associated with a specific query.  For example, tags in our first query:Identify the query as being associated with  and .In our second query, the tags identify the query as being associated with  and :We can view the tags with the following  statement:When the underlying data changes in our backing database, we can then use the tags to Invalidate old content in our data set:<h2>Eviction Strategies</h2>Caches need boundaries. You can use those boundaries to keep data sets fresh:pg_semantic_cache provides eviction strategies you can use to set those boundaries for your cache:<ul></ul><ul></ul><ul></ul><ul></ul><ul></ul>For easy maintenance in a production environment, you can schedule automatic cache clean up with pg_cron:<h2>Monitoring</h2>The extension provides built-in views for observability. The semantic_cache.cache_health view provides an overview of the number of entries and use of a given cache:The semantic_cache.recent_cache_activity view provides insight into the queries that are coming in for your cache to resolve:The semantic_cache.cache_stats view provides insight into the statistical hit rates for your cache:To help manage cost tracking, you should log each access with its associated cost:The semantic_cache.get_cost_savings view can help provide estimates of cost savings:<h2>Configuring Vector Dimensions</h2>The extension works with any embedding model. Each dimension in a vector string (for example, [0.45, 0.67, 0.23, 0.89, 0.12, 0.56, 0.78, 0.34]) contributes a small part of the semantic meaning, and together they position text or objects according to similarity rather than keywords. You configure the vector dimension to store the number of numeric components used to represent a piece of data in your embedding model, as well as the percentage of vector dimensions (the similarity threshold) that is considered a match:Note: pgvector limits IVFFlat and HNSW indexes to a maximum of 2,000 dimensions. Models that produce higher-dimensional embeddings (e.g., OpenAI text-embedding-3-large at 3,072 dimensions) will work for storage and retrieval via sequential scan, but cannot use a vector index. For large caches with high-dimensional embeddings, consider using OpenAI's built-in dimension reduction (e.g., dimensions=1536) to stay within the indexable range.If your cache has more than 100,000 entries, switch to the HNSW index for better performance:<h2>Integration Pattern: Python Example</h2>The following example demonstrates how a pg_semantic_cache fits into a typical Python application:The first call to our application takes 1-2 seconds (embedding + LLM). The second and third return in under 5ms. Same answer, a fraction of the time, zero additional API cost.<h2>Why a PostgreSQL Extension?</h2>You could build semantic caching as a standalone service, as a Redis layer, or as application-level code. We built it as a PostgreSQL extension for practical reasons:<ul><li>Zero new infrastructure.</li><li> If you already run PostgreSQL (and you probably do), pg_semantic_cache is a </li><li>CREATE EXTENSION</li><li> away. No new services to deploy, monitor, or page on at 3 AM.</li></ul><ul><li>ACID compliance.</li><li> Cache operations participate in PostgreSQL transactions. No split-brain scenarios between your cache and your database.</li></ul><ul><li>Inherited operations.</li><li> Backup, replication, authentication, monitoring - your cache gets this functionality for free from your existing PostgreSQL setup.</li></ul><ul><li>Language-agnostic.</li><li> Anything that speaks SQL can use the cache, including Python, Node.js, Go, Java, & Ruby. No client SDK required.</li></ul><h2>Tuning for Your Workload</h2><h3>Similarity Threshold</h3>The threshold balances the tradeoff between hit rate and accuracy:<ul><li>0.98+</li><li>: Very conservative. Only near-identical rephrasings match. Low hit rate but zero risk of serving wrong answers.</li></ul><ul><li>0.95</li><li> (default): Good balance for most applications. Catches obvious rephrasings while maintaining accuracy.</li></ul><ul><li>0.90-0.93</li><li>: Aggressive. Higher hit rate but increased risk of matching queries with subtly different intent.</li></ul><ul><li>As a rule, you can start with a threshold setting of 0.95 and adjust the value based on your data hits and misses. The extension reports the closest similarity score even on misses, so you can see where to adjust your strategy.</li></ul><h3>TTL Strategy</h3>Time-to-live (TTL) controls how long an entry stays cached. Set it based on how frequently the underlying data changes:The last value specified in the call to  is the number of seconds that an entry stays in the cache.<h3>PostgreSQL Settings</h3>For production workloads with large caches, tune these  settings. The values below are sized for a 16GB server. Scale the values proportionally for your environment (shared_buffers ~25% of RAM, effective_cache_size ~75% of RAM):<h2>Cleaning Up</h2>When you're done experimenting with your container environment, you can use the following commands to clean up:</p> ]]></description>
            <guid>https://www.pgedge.com/blog/pg_semantic_cache-in-production-tags-eviction-monitoring-and-python-integration</guid>
            <author><name>Muhammad Aqeel</name></author>
            </item>
            <item>
            <category>PostgreSQL,PostgreSQL</category>
            <title><![CDATA[Semantic Caching in PostgreSQL: A Hands-On Guide to pg_semantic_cache]]></title>
            <link>https://www.pgedge.com/blog/semantic-caching-in-postgresql-a-hands-on-guide-to-pg_semantic_cache</link>
            <pubDate>Wed, 25 Feb 2026 06:03:29 GMT</pubDate>
            <description><![CDATA[ <p>Your LLM application is probably answering the same question dozens of times a day. It just doesn't realize it because the words are different each time.<h2>The Problem with Exact-Match Caching</h2>If you're running an AI-powered application like a chatbot, a RAG pipeline, an analytics assistant, or others, you've likely added a cache to cut down on expensive LLM calls. Most caches work by matching the exact query string. Same string, cache hit. Different string, cache miss.The trouble is that humans don't repeat themselves verbatim. These three queries all want the same answer:A traditional cache sees three unique strings and triggers three separate LLM calls. In production AI applications, research shows that 40-70% of all queries are semantic duplicates: different words, same intent. That translates directly into wasted API calls, wasted latency, and a bloated cloud bill.Semantic caching fixes this by matching on meaning instead of text. It uses vector embeddings to recognize that "Q4 revenue" and "last quarter's sales" are asking for the same thing, and serves the cached result in milliseconds instead of making another round trip to the LLM.pg_semantic_cache is a PostgreSQL extension that brings this capability directly into your database. In this post, we'll set it up from scratch in a Docker container using pgEdge Enterprise Postgres 17 and walk through working examples you can run yourself.<h2>What You'll Build</h2>By the end of this post, you'll have:<ul><li>A Docker container running pgEdge Enterprise Postgres 17 with pgvector and pg_semantic_cache.</li></ul><ul><li>A working semantic cache that matches queries by meaning.</li></ul><ul><li>Hands-on experience with caching and retrieval.</li></ul><ul><li>A clear understanding of how semantic similarity matching works in practice.</li></ul><h2>Setting Up the Environment</h2>For our example, we'll use a Rocky Linux 9 container with pgEdge Enterprise Postgres 17, which bundles pgvector out of the box.<h3>Dockerfile</h3>First, we create a file called Dockerfile that defines the content of our container: <h3>Build and Run</h3>Use the following commands to build and run the container:After waiting a few seconds for pgEdge Enterprise Postgres 17 to start, use the following command to connect to the Postgres server: <h3>Enable the Extensions</h3>Next, we'll use the psql command line to create a database and the extensions that we'll be using: After creating objects, query the  function to verify that the cache is empty:You should see: The cache is ready - now, let's put something in it.<h2>How It Works: A 60-Second Overview</h2>pg_semantic_cache stores query results alongside their vector embeddings in a Postgres database. When a new query comes in, the extension uses pgvector's cosine distance operator (<=>) to find the closest match in the cache. If the similarity exceeds your threshold (the default threshold is 0.95, meaning 95% similar), it returns the cached result. If not, it's a miss, and your application computes the result the normal way and stores it for next time.The key insight: vector embeddings capture semantic meaning. Two sentences that mean the same thing produce vectors that are geometrically close together, regardless of the exact words used. This is a property learned by modern embedding models from billions of text examples. No hand-crafted rules, no synonym dictionaries needed.<h2>Working Example: Caching AI Responses</h2>Let's simulate a real scenario. In production, your application generates embeddings using a model like OpenAI's text-embedding-3-small or a local model via Ollama. For this tutorial, we'll use small 8-dimensional vectors to keep things readable. The same principles apply whether your vectors have 8 dimensions or 3,072.<h3>Step 1: Configure for Small Vectors</h3><h3>Step 2: Cache Some Query Results</h3>Imagine three users asked questions about PostgreSQL, and your LLM generated answers. We'll store those results, as well as the vector string associated with the results: When we check what's in the cache, it lists the three entries: Three entries, no lookups yet.<h3>Step 3: Query with a Semantically Similar Embedding</h3>When a fourth user comes along and asks: "Explain ACID and transactions in Postgres." This is a different question from the first one, but semantically very close; your embedding model should produce a vector close to the one we stored when our earlier user asked about Postgres transactions.This is a cache hit. The similarity score is 0.999, meaning these two embeddings are nearly identical. The user gets the cached answer in a couple of milliseconds instead of waiting for an LLM round trip to the database.<h3>Step 4: Query for Something Different</h3>What happens when we look up an embedding that doesn't match anything in the cache?This is a cache miss. The closest match is only 68% similar… well below our 95% threshold. The extension still tells you the closest similarity score, which is useful for tuning your threshold. The new query, as well as its cache string is added to our cache table though, so it's ready for the next user with a similar vector string.<h3>Step 5: Check the Stats</h3>When we check out caching statistics, we have one hit, one miss, a 50% hit rate. In production with real user traffic, semantic caches typically achieve 60-80% hit rates compared to the 15-25% typical of exact-match caching. That's a 3-4x improvement in cache effectiveness, which directly translates to the same proportion of savings on LLM API costs.As our cache develops with use, we can even see improved cache efficiency.<h2>The Real-World Impact of Semantic Caching</h2>Rather than projecting specific dollar amounts (which vary wildly based on your model, volume, and pricing tier), here's how to think about the impact:Without semantic caching, your traditional cache catches only queries that are character-for-character identical. That's typically 15-25% of traffic. The remaining 75-85% of requests hit your expensive backend every time.With semantic caching, you catch the semantic duplicates too. Hit rates jump to 60-80%, meaning only 20-40% of traffic reaches your backend.The backend call reduction is the key number. If 65-75% fewer requests reach your LLM provider, your API spend drops by the same percentage. And every cache hit returns in 2-3ms instead of the 500ms-2s typical of an LLM call, so users get dramatically faster responses.For RAG pipelines, the impact compounds further. Each query directed to the cache skips the most expensive parts of the pipeline: vector retrieval from the knowledge base, and LLM completion. Embedding generation still occurs (you need the embedding to query the cache), but at a fraction of the cost… typically ~$0.0001 per embedding vs $0.01–0.03 for an LLM call. The LLM call dominates the cost, so avoiding it on cache hits still delivers significant savings.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/semantic-caching-in-postgresql-a-hands-on-guide-to-pg_semantic_cache</guid>
            <author><name>Muhammad Aqeel</name></author>
            </item>
            <item>
            <category>postgres,pgEdge,PostgreSQL</category>
            <title><![CDATA[Multi-Master Replication: Using pgEdge Enterprise Postgres with Spock and CloudNativePG]]></title>
            <link>https://www.pgedge.com/blog/multi-master-replication-using-pgedge-enterprise-postgres-with-spock-and-cloudnativepg</link>
            <pubDate>Tue, 11 Nov 2025 04:52:59 GMT</pubDate>
            <description><![CDATA[ <p>When I started exploring using CloudNativePG (CNPG) with pgEdge Enterprise Postgres and the Spock extension, I realized there were a few gotchas that weren’t obvious at first. In this blog post, I’ll share my experiences running a pgEdge docker image with Spock inside CloudNativePG, along with the lessons I learned about using superuser access, configuration management, and initialization scripts.In our CNPG cluster, each cluster has 3 Postgres pods inside. CNPG by default enables physical replication inside each CNPG cluster with a Read/Write primary and two read-only replicas. Our two CNPG clusters use Spock to enable bi-directional logical replication between the clusters, but internally also are replicating, allowing us to enable connection pooling to ensure resources are available for queries if needed.The steps that follow demonstrate running a pgEdge Enterprise Postgres docker image with Spock inside CNPG, setting up nodes, and configuring bi-directional replication<h2>1. Installing the CloudNativePG Operator</h2>The CNPG operator manages PostgreSQL clusters in a Kubernetes environment. Install it with the following commands:The helm install command deploys the operator in cnpg-system; kubectl get pods confirms that the pods are running.<h2>2. Deploying PostgreSQL Clusters</h2>In this step, we deploy two clusters (A and B) that we'll configure for bi-directional replication with the <a href="https://github.com/pgEdge/spock">Spock extension</a>.Cluster A (cluster-a.yaml)Cluster B (cluster-b.yaml)Next, use the  command to create the clusters:Wait until both clusters are healthy before moving on to step 3.<h2>3. Establishing Superuser Access</h2>Spock requires superuser privileges to create nodes and replication sets. In our .yaml file, we created a user named postgres, with superuser privileges (enableSuperuserAccess: true).Security Note: Be sure you disable superuser access after setup by re-invoking the .yaml file with enableSuperuserAccess: false.<h2>4. Creating Spock Nodes</h2>Nodes must exist before you configure replication; Spock will verify node existence before connecting.Cluster A Node (spock-node-a-job.yaml)Cluster B Node (spock-node-b-job.yaml)After creating the .yaml files, use the following commands to create the nodes.<h2>5. Defining Bi-Directional Replication</h2>Spock's bi-directional replication allows each node to act as both a publication node and a subscriber node; the following .yaml files create the bi-directional subscriptions between clusters.Cluster A → B (spock-repl-a-job.yaml)Cluster B → A (spock-repl-b-job.yaml)Use the following commands to execute the .yaml files and establish bi-directional replication between the two nodes:<h2>6. Testing Bi-Directional Replication</h2>The following commands connect to each node with psql and exercise replication to demonstrate that rows added on node 1 are replicated to node 2 and rows added on node 2 are replicated to node 1:In both cases, rows inserted in one node appear in the other, confirming active-active replication.<h2>Key Takeaways</h2><ul><li>Superuser Access: </li><li>Required for node creation and replication setup; remove after setup for security</li></ul><ul><li>Declarative Configuration: </li><li>CNPG ensures settings persist and prevents manual postgresql.conf changes from being overwritten. If you are trying to change postgresql.conf inside the entrypoint script of docker image, CNPG will override the configuration.</li></ul><ul><li>Separate shared_preload_libraries parameter: </li><li>CNPG has a separate parameter for postgresql.shared_preload_libraries so don’t modify it in the postgresql.conf file.</li></ul><ul><li>Initialization Scripts: </li><li>Use postInitApplicationSQL for database-specific extensions.</li></ul><ul><li>Node Creation: </li><li>Nodes must exist before replication; Spock helps by validating node availability.</li></ul></p> ]]></description>
            <guid>https://www.pgedge.com/blog/multi-master-replication-using-pgedge-enterprise-postgres-with-spock-and-cloudnativepg</guid>
            <author><name>Muhammad Aqeel</name></author>
            </item>
            <item>
            <category>postgres,pgEdge,PostgreSQL</category>
            <title><![CDATA[Seamless PostgreSQL Major Version Upgrades with CloudNativePG and Spock Logical Replication]]></title>
            <link>https://www.pgedge.com/blog/seamless-postgresql-major-version-upgrades-with-cloudnativepg-and-spock-logical-replication</link>
            <pubDate>Thu, 06 Nov 2025 05:20:21 GMT</pubDate>
            <description><![CDATA[ <p>One of the persistent challenges with PostgreSQL major version upgrades is maintaining logical replication during the process. The standard pg_upgrade utility doesn't preserve logical replication slots, which typically means tearing down and rebuilding replication configurations. For production environments running multi-cluster topologies, this has always been a significant operational hurdle.I recently conducted an experiment to test whether Spock logical replication could survive a CloudNativePG (CNPG) major version upgrade without manual intervention. While the conventional wisdom holds that pg_upgrade doesn't preserve logical replication slots, two components work in tandem to solve this elegantly: Spock's architecture stores all replication metadata—nodes, subscriptions, and replication sets—in dedicated tables within the spock schema, which survive the upgrade intact as user data. The pgEdge Helm chart's init-spock-job.yaml then reads this preserved metadata and automatically recreates the necessary logical replication slots after the upgrade completes. This combination of persistent metadata and intelligent automation is what makes the entire process seamless.<h2>The Experiment</h2>The test environment consisted of three PostgreSQL clusters running version 16, configured with Spock logical replication between them. The goal was straightforward: upgrade all three clusters to PostgreSQL 17 and verify that logical replication continued functioning without rebuilding subscriptions or replication slots.Test Parameters:<ul><li>Three CNPG clusters (pgedge-n1, pgedge-n2, pgedge-n3)</li></ul><ul><li>Initial version: PostgreSQL 16.10</li></ul><ul><li>Target version: PostgreSQL 17.6</li></ul><ul><li>Spock logical replication configured between all clusters</li></ul><ul><li>One cluster (pgedge-n1) configured with three instances for high availability by default</li></ul><ul><li>No manual scaling operations during upgrade</li></ul>The Helm chart used for this demonstration is available at <a href="https://github.com/pgEdge/pgedge-helm.git"><u>https://github.com/pgEdge/pgedge-helm.git</u></a><h2>Initial State: Three Clusters with Active Replication</h2>Starting with three single-node PostgreSQL 16 clusters, Spock replication was already established. Each cluster could both publish and subscribe to changes from the others.After initialization completed, the pgedge-n1 cluster was already running with three instances (the chart's default configuration for high availability):Creating a test table on pgedge-n1 and verifying replication:Within moments, the change appeared on pgedge-n2:Verification back on pgedge-n2 confirmed multi-directional replication was working:At this stage, all three clusters were synchronizing changes bidirectionally. The real test would come next.<h2>Must Check Before Upgrade: WAL Lag Verification</h2>Before starting any major version upgrade with CNPG and Spock, especially when upgrading to PostgreSQL 18, always ensure all Spock replication slots are fully caught up. PostgreSQL 18 introduces stricter pg_upgrade verification: any negative or high WAL lag in logical replication slots can cause the upgrade to fail.Running the verification check on each cluster:The wal_lag column should show 0 bytes for all slots before proceeding with the upgrade. If you observe negative or high WAL lag values, these must be addressed—either through resyncing or repairing the affected slots—before attempting the upgrade.Important considerations:PostgreSQL 18 Requirement: This verification step is particularly critical for PostgreSQL 18 upgrades, as pg_upgrade now performs stricter checks on replication slot synchronization. Slots that aren't fully caught up will block the upgrade process.Backup First: Always take a physical backup before initiating the upgrade. This provides a rollback path if issues are discovered during or after the upgrade process.Zero Tolerance: Don't proceed with any non-zero WAL lag. Even small amounts of lag can indicate synchronization issues that should be resolved in a controlled manner before the upgrade.<h2>The Upgrade: PostgreSQL 16 to 17</h2>The upgrade was triggered by updating the container image in the CNPG cluster specification. CloudNativePG handles the rest—orchestrating the upgrade process, managing temporary upgrade pods, and ensuring minimal downtime.Notice the pgedge-init-spock job running alongside the upgrade pods. This initialization job is crucial—it recreates Spock replication slots after the upgrade completes, ensuring logical replication can resume immediately.As the upgrade progressed:And finally:<h2>Post-Upgrade Verification</h2>After the upgrade was completed, all three clusters were running PostgreSQL 17.6. The critical question: did logical replication survive?Testing on pgedge-n1:On pgedge-n2:And on pgedge-n3:Confirming replication on pgedge-n2:And pgedge-n1:Logical replication was fully operational. No manual intervention was required.<h2>How Spock Survives the Upgrade</h2>The key to understanding why this works lies in how Spock manages replication metadata. Unlike native PostgreSQL logical replication, which relies entirely on system catalogs and replication slots, Spock stores its configuration in dedicated tables within the spock schema:<ul><li>spock.node — cluster definitions</li></ul><ul><li>spock.subscription — replication subscriptions</li></ul><ul><li>spock.replication_set — publication configurations</li></ul><ul><li>Additional metadata tables for conflict resolution, progress tracking, and state management</li></ul>During a pg_upgrade, PostgreSQL preserves user schemas and their data while replacing system binaries and catalogs. Since Spock's metadata lives in user tables, it survives the upgrade intact. The init-spock job that runs after the upgrade reads this metadata and recreates the necessary logical replication slots, allowing replication to resume immediately.This is fundamentally different from trying to preserve native PostgreSQL logical replication through an upgrade, where the replication slot configuration itself is lost.<h2>Practical Implications</h2>This capability has significant implications for production PostgreSQL deployments:Simplified Upgrade Workflows: The init-spock-job.yaml eliminates the need to manually disable replication, perform upgrades in isolation, and rebuild replication configurations afterward. The automation handles slot recreation transparently.Zero-Configuration Slot Recreation: The automated initialization job handles slot recreation based on Spock's stored metadata. Operators don't need to track subscription configurations separately or rebuild them manually post-upgrade.Operational Confidence: Knowing that replication configuration survives version upgrades reduces the risk profile of major version upgrades in complex multi-cluster environments.<h2>Considerations</h2>While this approach works reliably, there are important operational factors to consider:Upgrade Downtime: CNPG's in-place upgrade using pg_upgrade requires downtime. Plan your maintenance windows accordingly and ensure your application can tolerate the interruption.Backup Strategy: It's strongly recommended to take physical backups both before and after the upgrade completes. This provides a rollback path if issues are discovered post-upgrade.Docker image compatibility: Make sure base docker images of current version and upgrade version are compatible. As per CNPG documentation you can't update bullseye image with bookworm image.Spock Extension Compatibility: The Spock extension itself must be compatible with both the source and target PostgreSQL versions. For 16 to 17 upgrades, this is well-supported.Initialization Job Dependency: The init-spock job must run successfully after the upgrade. Monitor this job to ensure slot recreation completes as expected.Physical Replication Independence: CNPG's physical replication (for replicas within a cluster) operates independently of Spock's logical replication. Both can coexist without interference.Testing Recommended: As with any upgrade strategy, thorough testing in a non-production environment remains essential.</p> ]]></description>
            <guid>https://www.pgedge.com/blog/seamless-postgresql-major-version-upgrades-with-cloudnativepg-and-spock-logical-replication</guid>
            <author><name>Muhammad Aqeel</name></author>
            </item>    
    
        </channel>
    </rss>