PostgreSQL is dominating the database market, and the monitoring tools haven't noticed.

More teams run Postgres in production every year. More of those deployments are distributed, multi-region, and mission-critical. And the tooling most of those teams rely on was built for a simpler world: a single instance, a handful of threshold alerts, and a senior DBA who can interpret what the graphs mean at 3 AM. That works when you have one cluster and one person who knows where the bodies are buried. It falls apart the moment you scale past either of those constraints.

We built the pgEdge AI DBA Workbench to close that gap, and today it's entering public beta. We think it's the best PostgreSQL monitoring and management platform you've seen, and the rest of this post explains why any postgres 14+. Local installs, self-hosted enterprise estates, Supabase, Amazon RDS - If you can connect to it, you can monitor it.

Three Services, One Platform

The Workbench is a self-hosted platform that combines three services into a single deployment. A collector gathers metrics from every monitored PostgreSQL instance. An alerter evaluates those metrics against threshold rules and a layered anomaly detection system. A server ties everything together through a web UI, a REST API, and a Model Context Protocol (MCP) endpoint that lets AI tools talk directly to your databases.

It runs on your infrastructure, ships under the PostgreSQL license, and stores nothing outside your network.

34 Probes, Zero Agents

The collector is the foundation. Point it at your PostgreSQL instances (any PostgreSQL 14 or later, not just pgEdge) and it starts pulling metrics across 34 built-in probes covering query performance, replication health, active connections, WAL throughput, vacuum activity, checkpoints, database conflicts, IO statistics, system-level CPU and memory, disk usage, and more.

Two things matter about how collection works. First, the collector connects remotely over standard PostgreSQL connections. There are no agents to install on your database servers, no binaries to deploy and maintain on each node, no version compatibility headaches between agent and server. You give the collector a connection string and it handles the rest. Second, data management is automatic. The collector partitions metrics tables by time and enforces retention policies, so you don't end up nursing a monitoring datastore that's grown larger than the databases it's supposed to be watching.

For distributed PostgreSQL specifically, the collector tracks replication lag, slot usage, subscription status, and node roles as first-class metrics. If you're running Spock multi-master replication, these aren't bolted-on extras but rather baked into the probe system from the ground up.

Five Levels of Drill-Through

The web client ships with a hierarchical dashboard system that lets you go from fleet-wide overview to individual index statistics without switching tools. At the estate level, you see every cluster in your fleet with aggregate health indicators and AI-generated summaries. The cluster view renders replication topology with color-coded edges showing lag and health, so you can spot a degraded replication link at a glance. Server dashboards break down per-instance metrics: connections, transactions, tuple operations, cache hit ratios. Database and object views take you down to individual table and index statistics, including bloat, sequential scan ratios, and dead tuple counts.

Every dashboard level includes AI-powered summaries that highlight what needs attention. The summaries aren't canned text templates filled with numbers but are generated by an LLM that has access to the actual metrics context, so the summary for a cluster with rising replication lag reads differently from one with connection pool exhaustion.

Three-Tier Anomaly Detection

Traditional monitoring tools give you threshold alerting: if replication lag exceeds 30 seconds, fire an alert. That catches the obvious problems, but it completely misses the subtle ones, like a slow drift in query latency that doesn't cross any single threshold but indicates an index that's quietly degrading, or a change in checkpoint frequency that correlates with a workload shift you haven't noticed yet.

The Workbench's alerter runs a three-tier detection system. The first tier is statistical baselines. The alerter builds rolling baselines for every metric and flags deviations that fall outside expected ranges. This catches the standard stuff efficiently without needing manual threshold configuration for every metric on every server.

The second tier is vector similarity. The alerter generates embeddings of anomalous metric patterns and compares them against a library of known anomaly signatures. If the current pattern looks like something that caused an outage last month, it gets flagged even if the raw numbers haven't crossed a threshold yet, which is pattern recognition rather than just pattern matching.

The third tier is LLM classification. When the statistical and embedding tiers surface something ambiguous, the alerter can call an LLM to classify what's happening. The LLM gets the metric context, the anomaly history, and the system state, and provides a classification that goes beyond "this number is higher than usual" to "this looks like a connection pool leak combined with long-running transactions," because it can reason about context rather than just reacting to numbers.

On top of detection, the alert management system respects the reality of on-call life. Blackout scheduling with cron-based recurrence so your maintenance windows don't generate noise. Hierarchical threshold overrides so production gets tighter bounds than staging. Automatic alert clearing when conditions normalize, because nothing drains on-call morale like manually closing alerts that already fixed themselves. Multi-channel notifications to Slack, email, Mattermost, or webhooks.

Meet Ellie

Ellie is the Workbench's built-in AI assistant, and she's the feature that changes how you actually interact with your monitoring data. Most monitoring tools bolt a chatbot onto their dashboard and call it "AI-powered." Ellie is different because she has access to 21 MCP tools that let her take real action. She can run EXPLAIN ANALYZE on your slow queries and walk you through the execution plan. She can inspect your schema, query historical metrics from the datastore, search the pgEdge documentation knowledge base, and retrieve alert history. When you ask "why is this query slow?", she doesn't give you a generic optimization checklist. She looks at the actual query plan, the actual table statistics, and the actual index usage, and tells you what's happening with your specific data.

Ellie supports multi-step diagnostic workflows with up to 50 iterations of tool use per conversation. That means she can follow a thread: start with a slow query, check the execution plan, look at the table's vacuum statistics, notice the bloat ratio is high, check when the last vacuum ran, and recommend either a manual vacuum or a configuration change. That's the kind of investigation a senior DBA does instinctively, and Ellie does it conversationally, step by step, showing her work the entire way.

She also remembers what you tell her. Pin a memory ("our busiest period is 2-4 PM EST," "the orders table is being deprecated in Q3," "always check replication lag before recommending schema changes") and it gets included in every conversation automatically, whether it's facts, preferences, instructions, or operational context. You don't have to re-explain your architecture every session, and no other PostgreSQL monitoring tool has this kind of persistent AI memory.

How This Stacks Up

The PostgreSQL monitoring market has options: pganalyze, Datadog, Percona PMM, pgwatch, EDB PEM. Here's where the Workbench sits relative to each.

CapabilitypgEdge AI DBA WorkbenchEDB PEMpganalyzeDatadog DBMPercona PMMpgwatch
AI/LLM assistantAgentic, 21 tools, multi-stepNoneIndex advisorML anomaly detectionBuilt-in advisorsNone
MCP protocolNativeNoPreviewNoNoNo
Distributed PG awarenessSpock-native/physical replication.logical replicationPatroni/streamingGenericGenericGenericGeneric
Anomaly detection3-tier (stats + embeddings + LLM)ThresholdThresholdML-basedML-based (PMM 3.0+)Threshold
Self-hostedYesYes (subscription)DockerSaaS platformYesYes
Air-gapped LLMYes (Ollama, LM Studio, etc.)N/AN/AN/AN/AN/A
Persistent AI memoryYesNoNoNoNoNo
Agent required on DB hostNoYes (per-server)CollectorAgentAgent + exportersNo
Open sourceYes (PostgreSQL License)NoNoNoYesYes

pganalyze gives you index recommendations and query insights, but it's a SaaS product where your monitoring data lives on their infrastructure and AI capabilities stop at static recommendations. Datadog offers ML-based anomaly detection and is strong for multi-service observability, but the PostgreSQL monitoring is one piece of a much larger platform and there's no agentic AI assistant that can actually run queries against your databases. Percona PMM is open source and capable, but its AI features are newer and less deeply integrated into the monitoring workflow. pgwatch is lightweight and open source, but it's pure metrics collection with no AI layer at all.

The Workbench's differentiators come down to three things: the agentic AI assistant with real tool access, the three-tier anomaly detection that goes beyond what any of these tools offer, and native support for distributed PostgreSQL. If you're running Spock multi-master replication, nothing else on this list monitors it natively.

Bring Your Own AI…

The Workbench supports Anthropic Claude, OpenAI, Google Gemini, and any OpenAI-compatible local model runner, including Ollama, LM Studio, llama.cpp, and EXO. For teams in regulated industries or air-gapped environments, you can run the entire platform with a local model and nothing leaves your network, and you can switch providers anytime without changing your workflow.

The MCP endpoint (that's Model Context Protocol, the open standard for connecting AI tools to data sources) also means any MCP-compatible client can connect to your Workbench, from Claude Code and Cursor to VS Code with GitHub Copilot and Windsurf. Your AI development tools get the same access to your monitoring data that Ellie has.

…Or Don’t

Not a fan of AI? The Workbench can be used as a traditional monitoring system for PostgreSQL – just don’t configure a model provider. You’ll get all of the features you would expect from a monitoring system: alerting, dashboards, cluster visualisation and more, just without the AI features.

Try the Beta

The AI DBA Workbench is available today as a public beta. Getting started with Docker Compose takes a few steps: clone the repo, generate a shared secret for encrypting connection passwords, create a datastore password file, and bring up the services.

git clone https://github.com/pgEdge/ai-dba-workbench.git

cd ai-dba-workbench

mkdir -p docker/secret

openssl rand -base64 32 > docker/secret/ai-dba.secret

echo "postgres" > docker/secret/pg-password

docker compose up -d

Once the services are running, point your browser at http://localhost:3000, create your admin user, add your PostgreSQL connection strings, and the collector starts gathering metrics immediately. The full documentation (including the Quick Start Guide) walks through every step, and the source code is on GitHub under the PostgreSQL License.

This is a beta, which means we want your feedback and we want it loud. Leave feedback by filing issues on the GitHub Repository to tell us what probes you need that we're missing, what Ellie got wrong, or even what made you grin. We built this because we saw a gap in how PostgreSQL teams monitor and manage their databases, and we believed there was a fundamentally better way to do it. Go try it and tell us if we were right.