Building a RAG Server with PostgreSQL - Part 2: Chunking and Embeddings
In Part 1 of this series, we loaded our documentation into PostgreSQL using the pgEdge Document Loader. Our documents are sitting in the database as clean Markdown content, ready for the next step: turning them into something an LLM can search through semantically.
In this post, we'll use pgEdge Vectorizer to chunk those documents and generate vector embeddings. By the end, you'll have a searchable vector database that can find relevant content based on meaning rather than just keywords.
What Are Embeddings and Why Chunk?
Before we dive in, let's quickly cover the concepts.
Vector embeddings are numerical representations of text that capture semantic meaning. Similar concepts end up close together in vector space, so "PostgreSQL replication" and "database synchronisation" would have similar embeddings even though they share few words. This is what makes semantic search possible.
Chunking is necessary because embedding models have token limits (typically 8,000 tokens or so), and more importantly, smaller chunks provide more focused results. If you embed an entire 50-page document as one vector, searching for "how to create an index" might return that whole document when you really want just the relevant paragraph. Breaking documents into smaller pieces gives you more precise retrieval.
Why Vectorize in the Database?
You could run a separate service to generate embeddings, but pgEdge Vectorizer takes a different approach: it runs inside PostgreSQL as an extension. When you insert or update documents, triggers automatically chunk the text and queue it for embedding. Background workers process the queue asynchronously, calling your chosen embedding API and storing the results.
This has several advantages:
No external service to deploy and manage
Embeddings stay in sync with your data automatically
Everything lives in one transactional system
You can use standard SQL for similarity search
Prerequisites
Before we start, you'll need:
PostgreSQL 14 or later (we set this up in Part 1)
The pgvector extension
An API key from OpenAI, Voyage AI, or a local Ollama installation
For this tutorial, I'll use OpenAI's embedding model, but I'll show the configuration for other providers too.
Installing pgEdge Vectorizer
First, let's build and install the extension. You'll need the PostgreSQL development files and libcurl (I’m running on Debian Trixie with PostgreSQL from the pgEdge repos which includes the development files - you may need to modify the commands below for your favourite OS and/or PostgreSQL distribution). We’ll also need pgvector
sudo apt-get install pgedge-postgresql-server-dev-18 pgedge-postgresql-18-pgvector libcurl4-openssl-devNow clone and build:
git clone https://github.com/pgEdge/pgedge-vectorizer.git
cd pgedge-vectorizer
make
sudo make installConfiguring PostgreSQL
The vectorizer runs as a background worker, so we need to add some configuration to postgresql.conf. Find your config file (usually in your data directory or /etc/postgresql/<version>/main/) and add:
# Load the vectorizer extension
shared_preload_libraries = 'pgedge_vectorizer'
# Embedding provider configuration
pgedge_vectorizer.provider = 'openai'
pgedge_vectorizer.api_key_file = '/var/lib/postgresql/.pgedge-vectorizer-api-key'
pgedge_vectorizer.model = 'text-embedding-3-small'
# Worker settings
pgedge_vectorizer.num_workers = 2
pgedge_vectorizer.batch_size = 10
pgedge_vectorizer.databases = 'ragdb'
# Chunking defaults
pgedge_vectorizer.default_chunk_size = 400
pgedge_vectorizer.default_chunk_overlap = 50
# Queue maintenance
pgedge_vectorizer.auto_cleanup_hours = 24A few notes on these settings:
api_key_file should contain just your OpenAI API key, nothing else. Make sure the file has restrictive permissions (chmod 600)
num_workers controls how many parallel workers process embeddings. Start with 2 and adjust based on your API rate limits
batch_size is how many chunks are sent per API call. OpenAI handles 10 efficiently
databases tells the workers which databases to monitor
default_chunk_size is in tokens (roughly 4 characters per token for English text)
default_chunk_overlap provides context continuity between chunks
Create your API key file:
echo "sk-your-openai-api-key-here" > /var/lib/postgresql/.pgedge-vectorizer-api-key
chmod 600 /var/lib/postgresql/.pgedge-vectorizer-api-keyNow restart PostgreSQL for the changes to take effect:
sudo systemctl restart postgresqlCreating the Extension
Connect to your database and create the extension:
psql -U postgres -d ragdb
-- Create the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Install the vectorizer
CREATE EXTENSION IF NOT EXISTS pgedge_vectorizer;You can verify the configuration:
SELECT * FROM pgedge_vectorizer.show_config();Enabling Vectorization
Now for the good part. We'll enable automatic vectorization on our documents table:
SELECT pgedge_vectorizer.enable_vectorization(
source_table := 'documents',
source_column := 'content',
chunk_strategy := 'token_based',
chunk_size := 400, -- Optional, to override the default
chunk_overlap := 50, -- Optional, to override the default
embedding_dimension := 1536
);
-- Ensure out test role can access the chunk table
GRANT SELECT ON documents_content_chunks TO docuser;This does several things:
Creates a new table called
documents_content_chunksto store the chunks and gives our test user read access to itCreates a trigger on the
documentstable to automatically chunk new or updated contentCreates an HNSW vector index for fast similarity search
Processes all existing documents in the table
Queues the chunks for embedding generation
The embedding_dimension of 1536 matches OpenAI's text-embedding-3-small model. If you're using a different model, adjust accordingly.
Watching It Work
The background workers should now be processing your documents. You can monitor progress with:
-- Overall queue status
SELECT * FROM pgedge_vectorizer.queue_status;
-- How many chunks are waiting?
SELECT * FROM pgedge_vectorizer.pending_count;
-- Check for any failures
SELECT * FROM pgedge_vectorizer.failed_items;You can also look at the chunks being created:
-- See how documents are being chunked
SELECT
d.title,
c.chunk_index,
c.token_count,
LEFT(c.content, 100) AS content_preview,
CASE WHEN c.embedding IS NULL THEN 'pending' ELSE 'done' END AS status
FROM documents d
JOIN documents_content_chunks c ON d.id = c.source_id
ORDER BY d.id, c.chunk_index
LIMIT 20;Depending on how many documents you loaded in Part 1, this might take a few minutes. The workers process chunks in batches, making API calls to generate embeddings.
Understanding the Chunk Table
Let's look at what got created:
\d documents_content_chunksYou'll see columns including:
id - Primary key for the chunk
source_id - Foreign key back to the original document
chunk_index - Position of this chunk in the document (1, 2, 3...)
content - The actual chunk text
token_count - Approximate number of tokens
embedding - The vector embedding (1536 dimensions for OpenAI)
The chunk_index lets you reconstruct document order if needed, and source_id lets you join back to get the document title and other metadata.
Testing Semantic Search
Once embeddings are generated (check with SELECT * FROM pgedge_vectorizer.pending_count;), you can try a semantic search. But wait - we need a way to embed our search query too. For now, let's use a simple approach with the pgvector extension:
The vectorizer provides a generate_embedding() function that lets you create embeddings directly in SQL, which is perfect for embedding search queries:
-- Search for chunks about database backups
-- using generate_embedding() to embed the search query
SELECT
d.title,
c.chunk_index,
LEFT(c.content, 200) AS content_preview,
c.embedding <=> (SELECT pgedge_vectorizer.generate_embedding('what is pgAdmin?'::text)) AS distance
FROM documents_content_chunks c
JOIN documents d ON c.source_id = d.id
WHERE c.embedding IS NOT NULL
ORDER BY distance
LIMIT 5;This finds chunks semantically similar to one that mentions "what is pgAdmin?". The <=> operator calculates cosine distance between vectors - lower values mean more similar content.
Alternative Embedding Providers
Not everyone wants to use OpenAI. Here's how to configure other providers:
Voyage AI
Voyage offers high-quality embeddings, often at lower cost than OpenAI:
pgedge_vectorizer.provider = 'voyage'
pgedge_vectorizer.api_url = 'https://api.voyageai.com/v1'
pgedge_vectorizer.api_key_file = '/home/postgres/.voyage-api-key'
pgedge_vectorizer.model = 'voyage-2'Voyage's voyage-2 model produces 1024-dimensional embeddings, so adjust your embedding_dimension accordingly. If you change the embedding provider, you’ll need to recreate the chunks and vectors (see below).
Ollama (Local)
If you want to run embeddings locally without any API calls:
pgedge_vectorizer.provider = 'ollama'
pgedge_vectorizer.api_url = 'http://localhost:11434'
pgedge_vectorizer.model = 'nomic-embed-text'No API key needed. You'll need to have Ollama installed and running with the model pulled:
# Install Ollama from https://ollama.ai
ollama pull nomic-embed-textThe nomic-embed-text model produces 768-dimensional embeddings. Local embedding may be slower than API calls (depending on your hardware) but gives you complete privacy and no usage costs.
Chunking Strategies
The vectorizer currently supports token-based chunking, which splits text into fixed-size pieces with configurable overlap. The overlap ensures that context isn't lost at chunk boundaries - if a sentence spans two chunks, the overlap means it appears in both.
For most documentation, the defaults work well:
chunk_size: 400 - About 1,600 characters, or roughly a long paragraph
chunk_overlap: 50 - About 200 characters of overlap
If your content is more technical with longer explanations, you might increase the chunk size. For FAQ-style content with short answers, smaller chunks might work better.
Handling Updates
One nice thing about the trigger-based approach: updates just work. When you re-run the document loader with update: true, the vectorizer detects changed content and automatically:
Deletes the old chunks for that document
Re-chunks the new content
Queues the new chunks for embedding
You don't need to do anything special.
Maintenance
By default, background workers automatically clean up completed queue entries older than 24 hours (controlled by the auto_cleanup_hours setting). You can also clean up manually if needed:
-- Remove completed items older than 24 hours
SELECT pgedge_vectorizer.clear_completed(older_than_hours := 24);If you have failures (maybe the API was temporarily down), you can retry them manually. This is typically handled automatically with the background workers configurable automatic retry mechanism which uses exponential back-off, so should be rarely needed:
-- Retry failed items from the last 24 hours
SELECT pgedge_vectorizer.retry_failed(max_age_hours := 24);If you need to completely rebuild chunks (say, after changing chunk_size), you can recreate them:
SELECT pgedge_vectorizer.recreate_chunks(
source_table_name := 'documents',
source_column_name := 'content'
);This deletes all existing chunks and re-processes from scratch, which may be useful if you change the embedding provider or model..
What We've Built
At this point, you have:
Documents chunked into semantically meaningful pieces
Vector embeddings for each chunk enabling similarity search
Automatic synchronisation when documents change
All running inside PostgreSQL with no external services
Your database now supports semantic search. Given embeddings for any text query, you can find the most relevant chunks of your documentation based on meaning, not just keyword matching.
Next Steps
In Part 3, we'll deploy the pgEdge RAG Server to provide an API for your applications. The RAG server will:
Accept natural language questions
Generate embeddings for the query
Find relevant chunks using vector similarity
Send those chunks as context to an LLM
Return a grounded, accurate response
Stay tuned!



