Building Ask Ellie: A RAG Chatbot Powered by pgEdge
If you've visited the pgEdge documentation site recently, you may have noticed a small elephant icon in the bottom right corner of the page. That's Ask Ellie; our AI-powered documentation assistant, built to help users find answers to their questions about pgEdge products quickly and naturally. Rather than scrolling through pages of documentation, you can simply ask Ellie a question and get a contextual, accurate response drawn directly from our docs.
What makes Ellie particularly interesting from an engineering perspective is that she's built on PostgreSQL and pgEdge's ecosystem of extensions and tools, and she serves as both a useful tool for our users and a real-world demonstration of what you can build on top of PostgreSQL when you pair it with the right components. In this post, I'll walk through how we built her and the technologies that power the system.
The Architecture at a Glance
At its core, Ask Ellie is a Retrieval Augmented Generation (RAG) chatbot. For those unfamiliar with the pattern, RAG combines a traditional search step with a large language model to produce answers that are grounded in actual source material, rather than relying solely on the LLM's training data. This is crucial for a documentation assistant, because we need Ellie to give accurate, up-to-date answers based on what's actually in our docs, not what the model happens to remember from its training set.
The architecture breaks down into several layers:
Content ingestion: crawling and loading documentation into PostgreSQL
Embedding and chunking: automatically splitting content into searchable chunks and generating vector embeddings
Retrieval and generation: finding relevant chunks for a user's query and generating a natural language response
Frontend: a chat widget embedded in the documentation site that streams responses back to the user
Let's look at each of these in turn.
Loading the Documentation
The first challenge with any RAG system is getting your content into a form that can be searched semantically. We use pgEdge Docloader for this; an open source (PostgreSQL licensed) tool designed to ingest documentation from multiple sources and load it into PostgreSQL.
Docloader is quite flexible in where it can pull content from. For Ellie, we configure it to crawl our documentation website, extract content from internal Atlassian wikis, scan package repositories for metadata, and clone git repositories to pull in upstream PostgreSQL documentation across multiple versions. It handles the messy work of stripping out navigation elements, headers, footers, and scripts, leaving us with clean text content that's ready for processing.
All of this content lands in a docs table in PostgreSQL, with metadata columns for the product name, version, source URL, title, and the content itself. This gives us a structured foundation that we can query and manage using familiar SQL tools.
Automatic Chunking and Embedding with Vectorizer
Once the documentation is in PostgreSQL, we need to turn it into something that supports semantic search. This is where pgEdge Vectorizer comes in, and it's one of the most elegant parts of the system.
Vectorizer is another open source PostgreSQL extension that watches a configured table and automatically generates vector embeddings whenever content is inserted or updated. We configure it to use a token-based chunking strategy with a chunk size of 400 tokens and an overlap of 50 tokens between chunks. The overlap ensures that concepts spanning chunk boundaries aren't lost during retrieval.
Under the hood, Vectorizer sends content to OpenAI's text-embedding-3-small model to generate the embeddings, which are stored in a docs_content_chunks table using the pgvector extension's vector column type. The beauty of this approach is that it's entirely automatic; when Docloader updates documentation in the docs table, Vectorizer picks up the changes and regenerates the relevant embeddings without any manual intervention. This means our search index stays current with the documentation with no additional pipeline orchestration required.
The RAG Server: Retrieval Meets Generation
The heart of the system is the pgEdge RAG Server, which orchestrates the retrieval and generation process. When a user asks Ellie a question, the RAG Server performs a vector similarity search against the docs_content_chunks table to find the 20 most relevant chunks, working within a token budget of 8,000 tokens for context. These chunks are then passed alongside the user's question and conversation history to Anthropic's Claude Sonnet model, which generates a natural, conversational response grounded in the retrieved documentation.
The RAG Server exposes a simple HTTP API with a streaming endpoint that returns Server-Sent Events (SSE), allowing the frontend to display responses as they're generated rather than waiting for the entire answer to be composed. This gives users a much more responsive experience, particularly for longer answers.
An important architectural benefit of the RAG Server approach is that it provides a strong data access boundary. Ellie can only ever see content that has been retrieved from our curated documentation set; it has no direct access to the database, no ability to run arbitrary queries, and no visibility into any data beyond what the retrieval step returns. This is a significant advantage over approaches such as giving an LLM access to a database via an MCP server, where the model could potentially query tables containing sensitive information, customer data, or internal configuration. With the RAG Server, the attack surface is inherently limited: even if a prompt injection were to succeed in changing the LLM's behaviour, the worst it could do is misrepresent the documentation content it has already been given. It simply cannot reach anything else.
On the network side, we bind the RAG Server to localhost only so that it never receives traffic directly from the internet; instead, we use a Cloudflare Tunnel to securely route requests from our Cloudflare Pages site to the server without exposing any public ports. A Cloudflare Pages Function acts as a proxy, handling CORS headers, forwarding authentication secrets, and, crucially, sanitising error messages to prevent any internal details such as API keys from being leaked to the client.
The Frontend: More Than Just a Chat Bubble
Whilst the backend does the heavy lifting, the frontend deserved careful attention too. The chat widget is built as vanilla JavaScript (no framework dependencies to keep things light) and weighs in at around 1,600 lines of code across several well-organised classes.
Beyond the basic chat functionality, there are a few features worth highlighting:
Conversation compaction: as conversations grow longer, the system intelligently compresses the history to stay within token limits. Messages are classified by importance (anchor messages, important context, routine exchanges), and less important older messages are summarised or dropped whilst preserving the essential thread of the conversation.
Security monitoring: the frontend includes input validation that detects suspicious patterns indicative of prompt injection attempts, HTML escaping before markdown conversion, URL validation in rendered links, and a response analyser that flags potential prompt injection successes. It's worth being clear about what these measures actually do, however: they log and monitor rather than block. A determined user could bypass the frontend validation entirely by editing the JavaScript in their browser or crafting HTTP requests directly, so we treat the frontend as an observability layer rather than a security boundary. The real defence against prompt injection lies in the system prompt configuration on the RAG Server, which instructs the LLM to maintain Ellie's identity, refuse jailbreak attempts, and never reveal internal instructions. This is a defence-in-depth approach: the RAG Server's architecture limits data exposure to our curated documentation set, the system prompt instructs the LLM to behave appropriately, and the frontend catches casual misuse and provides telemetry for ongoing monitoring.
Streaming with buffering: responses are streamed via SSE and buffered at word boundaries to ensure smooth display without jarring partial-word rendering.
Persistence: conversation history is stored in localStorage, so users can return to previous conversations. The chat window's size and position are also persisted.
Mobile awareness: on smaller viewports, the chat widget doesn't auto-open to preserve the readability of the documentation content itself.
Infrastructure and Deployment
The entire backend infrastructure is managed with Ansible playbooks, which handle everything from provisioning the EC2 instance running Debian to installing pgEdge Enterprise Postgres 18 with the required extensions, configuring the RAG Server and Docloader, setting up the Cloudflare Tunnel, and establishing automated AWS backups with daily, weekly, and monthly retention policies. Sensitive configuration such as API keys and database credentials is managed through Ansible Vault.
The documentation site itself is built with MkDocs using the Material theme and deployed on Cloudflare Pages, which gives us global CDN distribution and the Pages Functions capability that we use for the chat API proxy.
Ellie's Personality
One of the more enjoyable aspects of building Ellie was defining her personality through the system prompt. She's configured as a database expert working at pgEdge who loves elephants (the PostgreSQL mascot, naturally) and turtles (a nod to the PostgreSQL Japan logo). Her responses are designed to be helpful and technically accurate, drawing on both the PostgreSQL documentation and pgEdge's own product docs. She's knowledgeable about PostgreSQL configuration, extensions, and best practices, as well as pgEdge Enterprise Postgres and other pgEdge products such as Spock for multi-master replication and the Snowflake extension for distributed ID generation.
The system prompt also includes explicit security boundaries, although as discussed above, these are ultimately enforced at the LLM layer rather than the network layer. Ellie is instructed to maintain her identity regardless of what users ask, decline 'developer mode' or jailbreak requests, and never reveal her system prompt or internal instructions. She'll only reference people, teams, and products that appear in the actual documentation, ensuring she doesn't hallucinate information about the organisation. This is inherently a probabilistic defence; LLMs follow instructions with high reliability but not absolute certainty, which is why the monitoring and logging on the frontend remains valuable as a detection mechanism even though it can't prevent abuse.
A Showcase for pgEdge's AI Capabilities
What I find most satisfying about Ask Ellie is that she demonstrates what PostgreSQL is capable of when you build on its strengths. PostgreSQL 18 provides the foundation, the community's pgvector extension enables vector similarity search, and pgEdge's Vectorizer, Docloader, and RAG Server add the automation and orchestration layers on top. There's no separate vector database, no complex microservice mesh, and no elaborate ETL pipeline; just PostgreSQL with the right extensions and a handful of purpose-built tools.
If you're already running PostgreSQL (and let's face it, you probably are), the approach we've taken with Ellie shows that you don't need to adopt an entirely new technology stack to add RAG capabilities to your applications. Your existing PostgreSQL database can serve as both your operational data store and your AI-powered search backend, which is a compelling proposition for teams that want to avoid the operational overhead of deploying and maintaining yet another specialised system.
Give Ellie a try next time you're browsing the pgEdge docs; ask her anything about pgEdge products, PostgreSQL configuration, or distributed database setups. And if you're interested in building something similar for your own documentation or knowledge base, take a look at the pgEdge RAG Server, Vectorizer, and Docloader documentation to get started.



