Your AI App Works On Postgres. Now Make It Production-Ready Without Starting Over

Every AI application built on PostgreSQL hits the same inflection point. You've got pgvector installed, embeddings in a table, a similarity search query that returns surprisingly good results. The prototype works and your team is excited. Someone asks "when can we ship this?" and you suddenly realize that the distance between "it works on my laptop" and "it works in production across three regions" is a lot larger than you thought.

The usual answer is to re-platform. Swap Postgres for a purpose-built vector database for the AI parts. Add a separate search service. Move to a managed offering that handles the scaling but forces you to redesign your data model. By the time you're done, the prototype you built is gone, replaced by an architecture that looks nothing like what you started with.

There's a different path. One where Postgres remains the foundation at every stage, your schema doesn't change, and the tools you need show up when you need them, not before. That's what we built at pgEdge, and this is a walkthrough of what that path looks like in practice.

The Beginning: Setting Up pgvector in PostgreSQL for AI Search

The prototype is deceptively simple. You have a table with your data, pgvector is installed as an extension, and an embedding column stores vectors alongside your regular business data. Something like this:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    category TEXT,
    embedding vector(1536)
);

CREATE INDEX ON products
    USING hnsw (embedding vector_cosine_ops);

HNSW (Hierarchical Navigable Small World) is the index type that makes approximate nearest-neighbor search fast enough for real workloads. You don't need to understand the algorithm. You need to know it works, and that it lives inside Postgres as a regular index on a regular table.

Querying it is just SQL:

SELECT name, description,
       1 - (embedding <=> $1) AS similarity
FROM products
ORDER BY embedding <=> $1
LIMIT 10;

That's the entire search layer. Your product catalog, your embeddings, your similarity search, all in one database, all queryable with standard SQL. This is the part that makes people fall in love with the Postgres approach to AI: you don't need a second database, a separate vector store, or a different query language. It's just Postgres.

The problem is that "just Postgres" gets you through the prototype. Production needs more.

Connect AI Coding Agents to Postgres with MCP

If you're building with an AI coding assistant (Claude Code, Cursor, Windsurf, VS Code Copilot), the first thing that changes your workflow is connecting the agent directly to your database. Instead of copy-pasting schema definitions into prompts and hoping the AI remembers your table structure, you give it structured access to the actual schema.

The pgEdge Postgres MCP Server does this through the Model Context Protocol (MCP), the emerging standard for connecting AI agents to data sources. Point it at your database, and any MCP-compatible client gets full schema awareness: tables, columns, types, constraints, indexes, vector columns. The agent can generate accurate SQL grounded in your actual data structures instead of guessing from context.

# mcp-config.yaml
databases:
  - name: "products-db"
    connection_string: "postgresql://user:pass@localhost:5432/myapp"
    
read_only: true

The MCP Server runs in read-only mode by default. Your AI agent can explore, query, and do similarity search, but it can't modify data unless you explicitly allow it. That matters a lot more once you're past the prototype stage and there's real data at stake.

What shifts here is the development velocity. Your agent understands the schema, can suggest indexes based on query patterns, can run EXPLAIN ANALYZE to diagnose slow queries, and can do hybrid search combining vector similarity with keyword matching. The database becomes a first-class participant in the AI development workflow instead of something you interact with through a separate terminal window.

Automating Embeddings & RAG with the pgEdge AI Toolkit

The prototype had hand-crafted embeddings. You probably generated them with a script that called OpenAI's API, stored the vectors in a batch, and moved on. That works for a demo with a few hundred rows. It falls apart when you need to process documents continuously, keep embeddings up to date as content changes, and retrieve context for an LLM to reason over.

This is where the AI Toolkit pipeline fills in the gaps, and all of it stays inside PostgreSQL.

Docloader handles ingestion. Point it at a directory of documents (Markdown, HTML, reStructuredText), a Git repository, or a set of files, and it loads them into a PostgreSQL table with extracted metadata. It runs transactionally, so a batch either fully commits or rolls back, and UPSERT mode lets you re-run the same load to pick up changes without duplicating rows.

pgEdge_Vectorizer watches the document tables via triggers. When rows land or change, background workers automatically chunk the text and generate embeddings through your choice of provider (OpenAI, Voyage AI, or Ollama for air-gapped environments). The chunking is configurable: fixed token windows, Markdown-aware splitting that respects document structure, or a hybrid approach. The results go into chunk tables with pgVector columns, indexed for similarity search.

RAG Server ties it together for retrieval-augmented generation. When a question comes in, it generates an embedding, runs a hybrid search combining pgVector cosine similarity with BM25 keyword ranking, fuses results using Reciprocal Rank Fusion, and sends the assembled context to an LLM for a response.

The important thing about this pipeline is what didn't change. Your original products table with its pgvector column is still there. The similarity search query from the prototype still works. You added capabilities around the core without replacing it. The pipeline components are doing the operational work (ingestion, chunking, embedding, retrieval) that you were doing by hand during the prototype, but the data model underneath is the same PostgreSQL you started with.

Scaling PostgreSQL AI Apps: The Multi-Region Problem

Your AI application is running, the pipeline is processing documents, users are querying it, and the single PostgreSQL instance is handling the load. Until it isn't.

The conversation usually starts with availability. A single node means a single point of failure, which means downtime takes the whole application down. Then latency comes up: users in Frankfurt are hitting a database in Virginia, and the round-trip time is noticeable. Then write throughput: the vectorizer is processing new documents while the RAG server is running searches and the application is handling your regular transactional workload, all against the same node.

This is the moment where most AI-on-Postgres architectures break. The typical advice is to move your vectors to a dedicated vector database for the search workload, set up read replicas for the transactional queries, and add a caching layer in front of everything. Three months later you're operating three different data stores, maintaining consistency between them, and debugging issues that only surface when one of the replication channels falls behind.

pgEdge Postgres for distributed setups takes a different approach. Spock is our open-source PostgreSQL logical replication extension that enables multi-master (active-active) replication across distributed nodes - including cross regions. The use of this extension lets you replicate your PostgreSQL database, pgvector columns and all, across multiple nodes in multiple regions. Every node can accept writes. Conflict resolution happens automatically. And the critical part: your schema stays exactly the same. The products table, the embedding column, the HNSW index, the similarity search query from page one of this blog post, all identical. You've added multi-master replication underneath the database, not a new database on top of your application.

For teams that want the managed version, pgEdge Cloud handles the deployment and operations. Spin up a distributed cluster across AWS, Azure, or GCP regions, connect your application with the same connection string it was already using (pointed at the nearest node), and your AI application is running multi-region without a single line of code changed.

Why Postgres Should Remain Your AI App Foundation

What happened over the course of this walkthrough is what we call the adoption ladder, and it's the reason we built pgEdge the way we did.

You started with single-node Postgres and pgvector. That was enough for the prototype. You added the MCP Server to accelerate development. You added the pipeline components to automate what you were doing by hand. You added multi-master replication to make it production-grade across regions. At no point did you re-platform. At no point did you throw away your schema, rewrite your queries, or swap out your database for something else. Postgres was the foundation at every step, and each pgEdge tool entered the picture when you needed it, not before.

That's what "prototype to production" should actually look like: the same database growing with you, not a migration to something else.

You don't have to start over and re-platform. You never did.

If you want to try any of this, the pgEdge Postgres MCP Server and the AI Toolkit components are open source. pgEdge Cloud handles the distributed deployment if you'd rather not manage it yourself. And our Codespaces walkthroughs will get you from zero to a running environment in a couple of minutes, no installation required. We’re here to help, and bring decades of dedicated PostgreSQL experience to the table to make sure you’re good-to-go. If you need expert support services to back up your enterprise applications, get in touch with us - anytime.