pgEdge Vectorizer and RAG Server: Bringing Semantic Search to PostgreSQL (Part 2)

In my previous blog, I walked through setting up the pgEdge MCP Server with a distributed PostgreSQL cluster, and connecting Claude to live database data through natural language. In this blog I want to look at a different problem: how do you build AI-powered search over your own content, without adding a separate vector database to your infrastructure?

This is where the pgEdge Vectorizer and RAG Server come in. Together, they give you a complete open-source Retrieval-Augmented Generation (RAG) pipeline that runs entirely inside PostgreSQL. In this blog, I'll explain what each component does, how they work together, and walk through working examples that you can follow on your own PostgreSQL instance.

I am following the same pattern in this blog as I have been doing in my other blogs. The goal is to explain each component and then provide real world working examples in order to the reader to better understand these concepts.

Please note: I am using my Rocky Linux VM for this installation and testing and using the Ollama embedding provider (installed on my VM) to generate the embeddings.

Background: The Problem With Keeping Vector Search In Sync

Most teams building AI-powered search hit the same wall. You set up a vector search pipeline, load your documents, generate embeddings, and everything works. Then someone updates a document or adds a new one - suddenly you need a process to detect the change, re-chunk the content, regenerate the embeddings, and update the index. Teams typically solve this with custom scripts, message queues, or external orchestration tools - all of which need to be built, maintained, and monitored separately from the database.

The pgEdge Vectorizer eliminates that problem entirely. It runs as a PostgreSQL background worker. Once you enable Vectorizer on a table, it monitors the source data through triggers, chunks and embeds new or modified rows automatically, and keeps the search index in sync without any external orchestration. The same transactional guarantees that PostgreSQL gives you for regular data apply here too.

The pgEdge RAG Server sits in front of that data, exposing a simple HTTP API. When a query comes in, it performs a hybrid search that combines vector similarity with BM25 keyword matching to retrieve the most relevant chunks, and passes them to an LLM to generate a grounded answer. The result is accurate, context-aware responses based on your actual data, instead of the model's training set.

How the Pipeline Works

Before getting into setup, it helps to understand how the three components connect:

pgEdge Vectorizer — a PostgreSQL background worker extension that monitors source tables, chunks text content, calls your embedding provider (OpenAI, Voyage AI, or local Ollama), and stores the results in an automatically created chunk table. Triggers keep this in sync as data changes.
pgvector — the open source PostgreSQL extension that adds the vector data type, HNSW and IVFFlat indexes, and cosine/Euclidean/dot-product similarity operators. The vectorizer uses this to store and index embeddings. It is the foundation that everything else builds on.
pgEdge RAG Server — a Go-based API server that handles the retrieval and generation side. It receives a natural language query, embeds it, runs hybrid search against the chunk table, applies a token budget to fit the results into the LLM context window, and calls the LLM to generate a response.

It's important to note that you can use SQL functions to enable vectorization on a table and let the background worker handle the rest. There is no need to write embedding logic or manually keep the embeddings in sync with data changes.

Part 1: The pgEdge Vectorizer

What the Vectorizer Does Under the Hood

One of the recurring challenges with AI-powered applications is keeping your vector search index in sync with your source data. Most pipelines require custom scripts or external orchestration tools to detect changes, re-chunk documents, and regenerate embeddings. The pgEdge Vectorizer eliminates that entirely.

The Vectorizer runs as a PostgreSQL background worker process. When you call enable_vectorization() on a table column, the extension does three things: it creates a companion chunk table to store the generated embeddings, installs triggers on the source table to detect inserts and updates, and enqueues any existing rows for processing. The background workers then pick up items from the queue, split the text into overlapping chunks, call your configured embedding provider for each chunk, and insert the results into the chunk table. When source data changes, only the affected rows are re-processed, instead of the entire table.

This trigger-based approach is what makes it practical for production use. You don't need a separate change data capture system or a scheduled job - the vectorizer is always watching.

Installation

Before installing the vectorizer, make sure pgvector is installed on your PostgreSQL instance — the vectorizer depends on it to store and index the embeddings it generates. If you are running pgEdge Enterprise Postgres, pgvector is already included. For community PostgreSQL, install it from the pgvector GitHub repository or your package manager.

You will also need the PostgreSQL server development headers and libcurl. On Rocky Linux / RHEL / Fedora:

sudo dnf install libcurl-devel

Clone and build the vectorizer:

Please ensure to set your PG_CONFIG parameter before installing the extension.

git clone https://github.com/pgEdge/pgedge-vectorizer.git
cd pgedge-vectorizer
make
sudo make install

This installation needs to be performed for every pgEdge node in the cluster. I am running two nodes on localhost on my VM, so I am doing the installation on both nodes.

Configuration

The vectorizer runs as a background worker, so it must be added to shared_preload_libraries and configured in postgresql.conf before starting PostgreSQL:

# Load the extension at startup, please edit existing
shared_preload_libraries to include this extension.
shared_preload_libraries = 'pgedge_vectorizer'

# Embedding provider —using Ollama for local setup (no API key required)
pgedge_vectorizer.provider = 'ollama'
pgedge_vectorizer.api_url = 'http://localhost:11434'
pgedge_vectorizer.model = 'nomic-embed-text:latest'

# Which databases to monitor
pgedge_vectorizer.databases = 'testdb'

# Chunking settings
pgedge_vectorizer.default_chunk_size = 400   # tokens per chunk
pgedge_vectorizer.default_chunk_overlap = 50  # overlap between chunks
pgedge_vectorizer.num_workers = 2             # background worker count

For this blog I am using Ollama to generate embeddings since everything is running locally on my VM. Ollama is a great option for local development and testing — it runs entirely on your machine with no API keys or external calls required. For production deployments, you would typically switch to OpenAI's text-embedding-3-small or Voyage AI's voyage-3 model, both of which offer higher quality embeddings and better performance at scale. The vectorizer and RAG Server support all three providers, so switching is just a simple configuration change.

The model parameter tells the vectorizer which embedding model to use when generating vectors. In this case we are using nomic-embed-text:latest — a lightweight 137MB embedding model from Nomic AI that runs efficiently on CPU without requiring a GPU. The :latest tag ensures Ollama uses the most recent version of the model. This same model name must be consistent across both the vectorizer configuration and the RAG Server pipeline configuration — if the models differ, the query embeddings and the stored document embeddings will be generated by different models, making similarity search unreliable.

If you haven't already installed Ollama, run the following. Then restart PostgreSQL to load the background workers :

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
 
# Restart PostgreSQL to load the background workers

Once PostgreSQL is running, create the extension in your database:

--pgvector is pulled in automatically as a dependency
CREATE EXTENSION IF NOT EXISTS pgedge_vectorizer;

A Practical Example: Product Support Knowledge Base

Let me walk through a realistic example. Say you have a product support knowledge base — a table of articles that your support team maintains. You want users to be able to search it semantically, so that a question like "how do I reset my password?" finds the right article even if the article uses the phrase "account recovery" rather than "reset password."

First, create the source table:

CREATE TABLE kb_articles (
    id         BIGSERIAL PRIMARY KEY,
    title      TEXT NOT NULL,
    content    TEXT NOT NULL,
    category   TEXT,
    updated_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);
 
-- Index for general lookups
CREATE INDEX idx_kb_category ON kb_articles(category);

Enable vectorization on the content column — this single call sets up the chunk table, installs the triggers, and enqueues any existing rows:

SELECT pgedge_vectorizer.enable_vectorization(
testdb(#     source_table := 'kb_articles'::regclass,
testdb(#     source_column := 'content',
testdb(#     embedding_dimension := 768
testdb(# );
NOTICE:  Using primary key column: id (bigint)
INFO:  DDL statement replicated.
INFO:  DDL statement replicated.
INFO:  DDL statement replicated.
INFO:  DDL statement replicated.
NOTICE:  Vectorization enabled: kb_articles -> kb_articles_content_chunks
NOTICE:  Strategy: token_based, chunk_size: 400, overlap: 50
NOTICE:  Processing existing rows...
NOTICE:  Processed 0 existing rows
 enable_vectorization 
----------------------
 
(1 row)

The vectorizer automatically creates kb_articles_content_chunks. You can inspect its structure:

\d kb_articles_content_chunks
 
         Table "public.kb_articles_content_chunks"
 Column      | Type                     | Description
-------------+--------------------------+----------------------------
 id          | bigint                   | chunk primary key
 source_id   | bigint                   | FK to kb_articles.id
 chunk_index | integer                  | position within document
 content     | text                     | the chunk text
 embedding   | vector(1536)             | the generated embedding
 created_at  | timestamptz              |

Inserting Data and Monitoring Progress

Insert a few articles and watch the vectorizer process them:

INSERT INTO kb_articles (title, content, category) VALUES
('Account Recovery', 'To recover access to your account, click
 Forgot Password on the login page. Enter your email address
 and we will send a reset link. The link expires after 24 hours.',
 'authentication'),
('Setting Up Two-Factor Authentication', 'Two-factor authentication
 (2FA) adds an extra layer of security to your account. Go to
 Settings > Security > Enable 2FA. Scan the QR code with an
 authenticator app such as Google Authenticator or Authy.',
 'authentication'),
('Billing and Invoices', 'Your invoices are generated on the first
 of each month. Download them from the Billing section under
 Account Settings. Invoices are retained for 7 years.',
 'billing');

Check how many embeddings are still pending:

SELECT * FROM pgedge_vectorizer.pending_count;
 
 pending_count
--------------
            3
(1 row)

Wait a few seconds for the background workers to process the queue, then check again:

SELECT * FROM pgedge_vectorizer.pending_count;
 
 pending_count
--------------
            0
(1 row)
 
-- Confirm chunks were created
SELECT source_id, chunk_index,
       embedding IS NOT NULL AS has_embedding,
       vector_dims(embedding) AS dimensions, LEFT(content, 60) AS preview
FROM kb_articles_content_chunks;
 source_id | chunk_index | has_embedding | dimensions |                 preview                  
-----------+-------------+---------------+------------+------------------------------------------
         1 |           1 | t             |        768 | To recover access to your account, click+
           |             |               |            |  Forgot Password on
         2 |           1 | t             |        768 | Two-factor authentication               +
           |             |               |            |  (2FA) adds an extra layer of secu
         3 |           1 | t             |        768 | Your invoices are generated on the first+
           |             |               |            |  of each month. Dow
(3 rows)

Semantic Search Using generate_embedding()

With embeddings in place, you can run a semantic search directly in SQL. The vectorizer provides a generate_embedding() function that embeds your search query on the fly, so you do not need to pre-compute it in your application.

-- Find articles most relevant to a user query
SELECT
    a.title,
    a.category,
    LEFT(c.content, 120) AS preview,
    c.embedding <=> pgedge_vectorizer.generate_embedding(
        'how do I get back into my account?'
    ) AS distance
FROM kb_articles_content_chunks c
JOIN kb_articles a ON a.id = c.source_id
WHERE c.embedding IS NOT NULL
ORDER BY distance
LIMIT 3;
                title                 |    category    |                            preview                            |      distance       
--------------------------------------+----------------+---------------------------------------------------------------+---------------------
 Account Recovery                     | authentication | To recover access to your account, click                     +| 0.29047473679590063
                                      |                |  Forgot Password on the login page. Enter your email address +| 
                                      |                |  and we will send                                             | 
 Setting Up Two-Factor Authentication | authentication | Two-factor authentication                                    +|  0.3767101077478512
                                      |                |  (2FA) adds an extra layer of security to your account. Go to+| 
                                      |                |  Settings > Security > Enable 2F                              | 
 Billing and Invoices                 | billing        | Your invoices are generated on the first                     +|   0.519891788487777
                                      |                |  of each month. Download them from the Billing section under +| 
                                      |                |  Account Settings.                                            | 
(3 rows)

testdb=#

The distance value ranges from 0 (identical meaning) to 2 (completely unrelated). The Account Recovery article scores 0.29 - a very strong semantic match - even though the user asked, "how do I get back into my account?" using entirely different wording from the article's title. That is exactly what semantic search is supposed to do.

Another example :

SELECT
    a.title,
    a.category,
    LEFT(c.content, 120) AS preview,
    c.embedding <=> pgedge_vectorizer.generate_embedding(
        'where is the billing section?'
    ) AS distance
FROM kb_articles_content_chunks c
JOIN kb_articles a ON a.id = c.source_id
WHERE c.embedding IS NOT NULL
ORDER BY distance
LIMIT 3;
                title                 |    category    |                             preview                              |      distance      
--------------------------------------+----------------+------------------------------------------------------------------+--------------------
 Billing and Invoices                 | billing        | Your invoices are generated on the first                        +|  0.468704625738971
                                      |                |  of each month. Download them from the Billing section under    +| 
                                      |                |  Account Settings.                                               | 
 Account Recovery                     | authentication | To recover access to your account, visit the login page         +| 0.5409861638925357
                                      |                |  and click Forgot Password. Enter your registered email address. | 
 Setting Up Two-Factor Authentication | authentication | Two-factor authentication                                       +|  0.585547944450854
                                      |                |  (2FA) adds an extra layer of security to your account. Go to   +| 
                                      |                |  Settings > Security > Enable 2F                                 | 
(3 rows)

The query uses the <=> cosine distance operator from the pgvector extension to find which stored chunks are semantically closest to your question. A few things worth understanding here:

c.embedding - the stored 768-dimension vector for each chunk in the chunk table
<=> - pgvector's cosine distance operator. Returns a value between 0 (identical meaning) and 2 (completely unrelated). Lower is better
generate_embedding('...') - converts your search query into a vector on the fly by calling the same Ollama embedding model used to generate the stored embeddings
AS distance - labels the score so you can ORDER BY it to get the most relevant results first

The key point is that the model compares meaning, not keywords. A question like "where is the billing section?" will match the Billing and Invoices article even though it shares almost no words with the article content — because both are semantically about the same topic

Automatic Re-Embedding on Update

One of the most practical aspects of the vectorizer is how it handles content changes. When you update an article, the following trigger fires automatically — the old chunks are replaced and new embeddings are generated without any manual intervention:

UPDATE kb_articles
SET content = 'To recover access to your account, visit the login page
 and click Forgot Password. Enter your registered email address.
 A password reset link will be sent within a few minutes.
 The link is valid for 24 hours. If you do not receive the email,
 check your spam folder or contact support.'
WHERE title = 'Account Recovery';
 
-- The vectorizer detects the change and re-processes this row
SELECT * FROM pgedge_vectorizer.pending_count;
 
 pending_count
--------------
            1

Part 2: The pgEdge RAG Server

What the RAG Server Does

Direct SQL search with generate_embedding() is useful for development and debugging, but it is not how you expose semantic search to an application. The pgEdge RAG Server is the production-ready layer on top. It exposes a simple HTTP API, handles the embedding of incoming queries, runs hybrid search against the chunk table, manages the token budget for the LLM context window, and returns a generated answer alongside the source chunks used.

The hybrid search approach is worth explaining. Pure vector search finds semantically similar content but can miss exact keyword matches. Pure BM25 (keyword) search finds exact matches but misses paraphrases. The RAG Server combines both using Reciprocal Rank Fusion; this retrieves candidates from each method and merges the ranked lists, giving you the benefits of both approaches in a single query.

Prerequisites

The RAG Server is written in Go. You need Go version 1.23 or later:

go version
# go version go1.23.0 linux/arm64

Installation

git clone https://github.com/pgEdge/pgedge-rag-server.git
cd pgedge-rag-server
make build
 
# Verify the binary
./bin/pgedge-rag-server --help

Setting Up API Keys

The RAG Server reads API keys from files rather than environment variables — safer for production deployments. Create the key files with restrictive permissions:

# Using Ollama locally — no API keys needed
ollama pull nomic-embed-text:latest
 
echo 'sk-ant-your-anthropic-key' > ~/.anthropic-api-key
chmod 600 ~/.anthropic-api-key

To use Claude as the LLM for response generation in the RAG Server, you need an Anthropic API key. This is separate from your claude.ai subscription — the RAG Server calls the Anthropic API directly, which is billed by token usage. Head over to console.anthropic.com, create an account if you don't have one, and generate an API key under Settings > API Keys. You will also need to add a small credit balance under Plans & Billing — $5 is more than enough for testing and development. The RAG queries we are running here cost fractions of a cent each, so your credit will go a long way. Once you have the key, store it in a file and set the correct permissions

It is worth noting that Claude is not the only option here. The RAG Server supports multiple LLM providers for response generation — you can use OpenAI models like gpt-4o, or run a local model entirely through Ollama, which requires no API key at all.

Configuration

The RAG Server uses a YAML configuration file. A pipeline defines the complete RAG setup, including which database to query, which chunk table to search, and which LLM providers to use for embedding and generation:

server:
  listen_address: "0.0.0.0"
  port: 8080
 
api_keys:
  # Ollama runs locally — no API key required for embeddings
  anthropic: "~/.anthropic-api-key"
 
pipelines:
  - name: "support-kb"
    description: "Product support knowledge base"
    database:
      host:     "localhost"
      port:     5432
      database: "testdb"
      username: "ahsan"
      password: "pgedge110"
      ssl_mode: "disable"
    tables:
      - table:         "kb_articles_content_chunks"
        text_column:   "content"
        vector_column: "embedding"
    embedding_llm:
      provider: "ollama"
      base_url: "http://localhost:11434"
      model:    "nomic-embed-text:latest"
    rag_llm:
      provider:     "anthropic"
      model:        "claude-sonnet-4-20250514"
    token_budget: 4000
    top_n:        10

Starting and Verifying the Server

Use the following command to start the server and confirm the server health:

./bin/pgedge-rag-server -config config.yaml
 
# Verify with the health endpoint
curl http://localhost:8080/v1/health
{"status": "healthy"}

Querying the RAG API

The main query endpoint is POST /v1/pipelines/{pipeline-name}. The following command uses the support knowledge base we set up above:

curl -X POST http://localhost:8080/v1/pipelines/support-kb \
  -H 'Content-Type: application/json' \
  -d '{"query": "how do I get back into my account?"}'

The server retrieves the most relevant chunks from the database, applies the token budget, and sends them to the LLM. The response comes back as JSON:

curl -X POST http://localhost:8080/v1/pipelines/support-kb   -H 'Content-Type:
application/json'   -d '{"query": "how do I get back into my account?"}'
{"answer":"To recover access to your account, follow these steps:\n\n1. Visit the 
login page and click \"Forgot Password\"\n2. Enter your registered email 
address\n3. A password reset link will be sent to you within a few minutes\n4. Use 
the link to reset your password (the link is valid for 24 hours)\n\nIf you don't 
receive the email, check your spam folder or contact support for
assistance.","tokens_used":545}

Notice the answer is grounded in your actual knowledge base content — not a generic response from the model's training data. If you add "include_sources": true to the request body, the response will also include the specific chunks that were retrieved, which is useful for building citation-aware interfaces or debugging retrieval quality.

Conclusion

What I find most practical about this approach is that it does not require you to learn a new database system. If you are already running PostgreSQL, the vectorizer and RAG Server layer on top cleanly. Vectorizer handles the operationally painful part, keeping embeddings in sync as data changes, and the RAG Server turns that into a production-ready API with a single configuration file.

Together with the MCP Server I covered in my previous blog, these components give you a complete (100% open-source!) AI toolkit on PostgreSQL. The MCP Server for agents that need to explore and query live database data, and the Vectorizer plus RAG server for applications that need fast, controlled semantic search over curated content. Solve different problems and make your stack more cohesive when developing AI applications on PostgreSQL - try it out!

pgEdge Vectorizer and RAG Server: Bringing Semantic Search to PostgreSQL (Part 2)

Background: The Problem With Keeping Vector Search In Sync

How the Pipeline Works

Part 1: The pgEdge Vectorizer

What the Vectorizer Does Under the Hood

Installation

Configuration

A Practical Example: Product Support Knowledge Base

Semantic Search Using generate_embedding()

Automatic Re-Embedding on Update

Part 2: The pgEdge RAG Server

What the RAG Server Does

Prerequisites

Installation

Setting Up API Keys

Configuration

Starting and Verifying the Server

Querying the RAG API

Conclusion

SUBSCRIBE TO BLOG

Get started today.