What's Our Vector, Victor? Building AI Apps with Postgres

Something I’ve presented about recently (a couple times, now!) is how we can make AI actually useful with Postgres. Not the mystical robot overlord kind of AI that people worry about, but the practical, math-heavy kind that can actually help solve real problems with the data you already have.

Let me be clear up front: AI isn't magic. It's math. Lots and lots of math. And if you can write SQL queries (which, let's be honest, is why you're here), you can build AI applications with Postgres. Let me show you how.

Breaking Down the AI Buzzwords

Before we get too deep into the technical stuff, let's clarify some terms that get thrown around way too much these days:

LLM (Large Language Model): This is what people are actually talking to when they chat with AI systems like ChatGPT or Claude. It's the thing that generates responses that sound surprisingly human.

RAG (Retrieval Augmented Generation): We're all tired of AIs making stuff up, right? RAG forces them to use reference material - your actual data - instead of just hallucinating answers. It's basically saying "here's some context, now answer based on this instead of whatever random training data you remember."

Tokens: Text gets broken into chunks, and those chunks get meaning associated with them. Think of it as building up from word parts to bigger concepts, piece by piece.

Embeddings: Here's where it gets interesting, and where Postgres really starts to shine. An embedding is a vector, essentially a coordinate system for those tokens. You know how a Cartesian plot has X and Y coordinates? Vectors are like that, except they're hundreds or even thousands of coordinates long. That's where AI concepts live and get their granularity.

Understanding Vectors: The School of Fish

I love using this analogy because it actually captures what's happening under the hood. Picture a school of fish in a coral reef. A vector isn't just representing the species of fish, it's capturing everything about that fish in context. The position of its fins, what other fish it's swimming near, its place in the family tree, the rocks around it, what it eats, how it behaves.

That's exactly what a vector does for text or concepts. It's not just the word itself, it's all the surrounding context and meaning that gives it substance. It's like having a 3D hologram where you capture not just the position of the fish, but everything that makes that fish what it is at that moment.

This is why when you talk to an LLM, you get all that surrounding context. All the related concepts, the metadata, the contextual relationships - they're all encoded in those coordinate arrays.

The Postgres AI Stack That Actually Works

Here's what makes this all work with Postgres, and why I think this approach is superior to the usual Python-heavy AI stacks:

pgvector: This is the foundation. It adds vector similarity searches to Postgres, along with different vector types (binary, sparse, various array types), operators for comparing vectors (cosine, euclidean, dot product), and crucially, new index types like HNSW that make searching through thousands of coordinates actually feasible instead of impossibly slow.

pg_vectorize: This is where things get really interesting, and it's the extension I personally prefer for real-world work. Written in Rust by Adam Hendel (@chuckhend on GitHub), it builds on pgvector but adds the functionality that makes everything actually usable in production. It uses pgmq for background queuing and can integrate with pgCron for scheduled operations.

The beauty of pg_vectorize is that it handles all the tedious orchestration work that usually requires a bunch of Python code and multiple API calls to different services.

What pg_vectorize Actually Does

Here's where the magic happens. pg_vectorize gives you several key capabilities:

1. Text Encoding

Using vectorize.encode, you can transform individual prompts to search terms using whatever compatible vector you prefer. The output is compatible with pgvector vector search.

SELECT vectorize.encode(
    'Is Postgres the best database engine?', 
    'sentence-transformers/all-MiniLM-L12-v2'
);

Obviously, the answer is yes, but let's keep going.

2. Create and Maintain Embeddings

This is where pg_vectorize really shines:

SELECT vectorize.table(
    job_name => 'rt_article_embed',
    "relation" => 'blog_article',
    primary_key => 'article_id',
    update_col => ‘last_updated’,
    columns => ARRAY['author', 'title', 'content'],
    transformer => 'sentence-transformers/all-MiniLM-L12-v2',
    schedule => 'realtime'
);

When you specify realtime, something clever happens: any time you insert or update a row, it generates an embedding in the background using pgmq. Your insert completes immediately; the embedding generation happens asynchronously. This is absolutely crucial because generating embeddings can be slow (it's either a local CPU operation or a remote GPU call), and you definitely don't want that latency directly impacting your write performance.

I saw a Microsoft demo at a conference where they were generating embeddings synchronously on every insert. That's not a great approach if you value database performance.

Embeddings are maintained by pg_cron job, or pgmq live updates.

3. Vector Search

SELECT * FROM vectorize.search(
    job_name => 'blog_chat',
    query => 'Is Postgres the best database engine?',
    return_columns => ARRAY['author', 'title', 'content'],
    num_results => 5
);

This searches through all your vectorized content and returns the most semantically similar matches. It's not just keyword matching, it's understanding the intent and context behind your query.

4. Complete RAG in One Query

Here's the real magic trick:

SELECT vectorize.rag(
    job_name => 'blog_chat',
    query => 'Is Postgres the best database engine?'
    chat_model  => 'ollama/llama3.1'
) -> 'chat_response';

This single function takes your question, compares it against your vectorized content, pulls back the best matches, combines that with a prompt, sends it to the LLM, and returns a complete response as a JSON object that includes context if we need it. All the orchestration that used to require dozens of lines of Python code - gone.

The Architecture Simplification

The traditional RAG architecture is a nightmare of moving parts: User → App Layer → Transformer API → Database lookup → Combine context → LLM API → Parse response → App Layer → User. That's a lot of potential failure points and a lot of code to maintain.

With pg_vectorize, your stack becomes: App Layer makes SQL query → Postgres handles everything → Response back.

I rebuilt a RAG application I had originally written in Python using this approach, and I'm not exaggerating when I say it cut my codebase by at least half. Much easier to maintain, much easier to reason about, and much fewer things that can break.

Practical Implementation: The Chunking Strategy

Here's something important that doesn't get talked about enough: embeddings are less granular than LLMs themselves. Embeddings are usually “fuzzy” (only 384 coordinates), while LLMs have billions of parameters. This means you need to think strategically about chunking your content.

Don't try to create embeddings for entire 10-page articles. Break them into logical chunks: paragraphs, sections, whatever makes sense for your content. Each chunk gets its own embedding. When someone asks a question, you're matching against the specific chunk that's most relevant, not trying to capture the essence of an entire document in a single vector. This will provide sharper context.
-- Example chunked table

structure
CREATE TABLE blog_article_chunks (
    chunk_id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    article_id BIGINT NOT NULL REFERENCES blog_articles,
    chunk TEXT,
    last_updated TIMESTAMPTZ NOT NULL DEFAULT now()
);

Then you vectorize the chunks table instead of the articles table. Much better semantic matching.

Advanced Technique: Query Rephrasing

Here's a trick I learned from Adam that's incredibly useful: users don't always phrase questions in ways that match well with embeddings. If someone asks "how do window functions work?", a naive embedding search might just return content that contains the words "window" and "function" frequently, not necessarily content that explains what window functions actually do.

However, there’s an elegant solution. Use the LLM to rephrase the query first:

-- Step 1: Get a semantically better query

SELECT vectorize.generate(
    'Take this user question: "how do window functions work?" and rephrase it to better match semantic content about SQL window functions, including syntax, partitions, and practical examples.'
);

This might return something like "SQL window function syntax row partitions unbounded practical examples" which is much richer semantic content.

-- Step 2: Use that improved query for your RAG search

SELECT vectorize.rag(
    job_name => 'blog_chat',
    query => 'SQL window function syntax row partitions unbounded practical examples'
);

This gives you much better semantic matching instead of just literal word matches. The difference in result quality is dramatic.

Configuration and Setup

You can configure pg_vectorize to work with external services like OpenAI, or you can run everything locally. Want to keep your data completely internal?

-- Search using Ollama or vLLM instead:

ALTER SYSTEM SET vectorize.openai_service_url 
     TO 'https://api.myserver.com/v1';

-- Or, use a custom transformer service:

ALTER SYSTEM SET vectorize.embedding_service_url 
     TO 'https://api.myserver.com/v1';

With distributed Postgres systems, you can co-locate your AI processing with your data nodes, reducing latency and keeping everything within your security perimeter. Using this approach, no data leaves your infrastructure.

Why This Matters

The bottom line is this: if you can write SQL queries, you can build sophisticated AI applications. You don't need to become an expert in transformer architectures, manage API tokens across multiple services, handle complex JSON parsing from different AI providers, or write orchestration code to coordinate between embedding services and LLMs.

Your data is already in Postgres. Your application logic can stay close to your data. The AI functionality becomes just another database capability rather than a separate system to manage.

And here's the real advantage: when ChatGPT is having an outage or when your OpenAI bill is getting out of hand, your local setup keeps running. When you need to meet compliance requirements about data not leaving your infrastructure, you've got that covered. When you want to fine-tune models on your specific domain, you control the entire stack.

Getting Started

If you can write queries, you can build AI apps using Postgres. All you have to do is load your content (which you'd have to do anyway), vectorize your tables with a single function call, and start querying. It doesn’t matter whether you're building a customer service system that knows your documentation, a search interface for internal knowledge bases, or any other AI-powered application; it all can be contained within a few SQL functions.

That's the real power of keeping AI close to your data. Not to mention, chances are if you're here, your data is already in Postgres.

Thinking about watching the webinar in full? It’s recorded - find it here

What's Our Vector, Victor? Building AI Apps with Postgres

Breaking Down the AI Buzzwords

Understanding Vectors: The School of Fish

The Postgres AI Stack That Actually Works

What pg_vectorize Actually Does

1. Text Encoding

2. Create and Maintain Embeddings

3. Vector Search

4. Complete RAG in One Query

The Architecture Simplification

Practical Implementation: The Chunking Strategy

Advanced Technique: Query Rephrasing

Configuration and Setup

Why This Matters

Getting Started

Shaun Thomas

Get started today.

What's Our Vector, Victor? Building AI Apps with Postgres

Breaking Down the AI Buzzwords

Understanding Vectors: The School of Fish

The Postgres AI Stack That Actually Works

What pg_vectorize Actually Does

1. Text Encoding

2. Create and Maintain Embeddings

3. Vector Search

4. Complete RAG in One Query

The Architecture Simplification

Practical Implementation: The Chunking Strategy

Advanced Technique: Query Rephrasing

Configuration and Setup

Why This Matters

Getting Started

Shaun Thomas

SUBSCRIBE TO BLOG

Get started today.