Enforcing Constraints Across Postgres Partitions

Postgres table partitioning is one of those features that feels like a superpower right up until it isn't. Just define a partition key, carve up data into manageable chunks, and everything hums along beautifully. And what's not to love? Partition pruning in query plans, smaller tables, faster maintenance, easy archiving of old data; it's a smorgasbord of convenience.

Then you try to enforce a unique constraint without including the partition key, and Postgres behaves as if you just asked it to divide by zero. Well... about that.

The Rule Nobody Reads Until It's Too Late

The Postgres documentation on partitioning spells it out pretty clearly in the limitations section:

To create a unique or primary key constraint on a partitioned table, the partition keys must not include any expressions or function calls and the constraint's columns must include all of the partition key columns.

Read that again. The constraint's columns must include all of the partition key columns. Not "should." Not "it would be nice if." Must. And the reasoning is maddeningly justified: each partition maintains its own index, and a local index can only enforce uniqueness within its own partition. Postgres has no concept of a global index that spans all partitions simultaneously, so it has no mechanism to check whether some value in partition A already exists in partition B.

Other database engines (Oracle, for instance) have global indexes that solve this at the storage layer. Postgres does not, and there's been no serious movement on the mailing lists to add them. So we're left to our own devices.

When Theory Meets the Event Pipeline

Consider a fairly common (if somewhat contrived) scenario: an events table partitioned by range on an identity column. The table includes a deduplication_id that the application uses to prevent duplicate event processing. Naturally, that should be unique across all partitions.

CREATE TABLE events (
  id                BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  deduplication_id  BIGINT NOT NULL,
  payload           TEXT NOT NULL,
  event_date        TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY RANGE (id);
 
CREATE TABLE events_0m
PARTITION OF events
FOR VALUES FROM (1) TO (1000000);
 
CREATE TABLE events_1m
PARTITION OF events
FOR VALUES FROM (1000000) TO (2000000);
 
CREATE TABLE events_2m
PARTITION OF events
FOR VALUES FROM (2000000) TO (3000000);

Now try adding UNIQUE (deduplication_id) to that table without an error. The partition key is id, and deduplication_id doesn't include it, so Postgres rejects the constraint. You could make a composite unique constraint on (deduplication_id, id), but that's effectively useless for deduplication since id is already unique. Every row would satisfy the constraint regardless of duplicate deduplication_id values.

This is especially painful for date-range partitions, which are probably the most popular partitioning strategy in the wild. It's common to partition by month or week, but there's no universal uniqueness strategy there, just distinct intervals. The partition key is there for data management, not for data integrity, and Postgres can't separate those concepts without assistance.

So what can we do to help it?

Brute Force to the Rescue

If Postgres won't enforce global unique constraints for us, can we do it ourselves? This is Postgres after all, so there are many tools at our disposal. Triggers, for example, exist for exactly this kind of scenario.

The simplest approach is a BEFORE INSERT trigger that scans the entire partitioned table set for duplicates:

CREATE INDEX idx_events_deduplication_id ON events (deduplication_id);
 
CREATE OR REPLACE FUNCTION f_check_dupes()
RETURNS TRIGGER AS
$$
BEGIN
  PERFORM * FROM events WHERE deduplication_id = NEW.deduplication_id;
  IF FOUND THEN
    RAISE UNIQUE_VIOLATION
          USING MESSAGE = 'Duplicate found: ' || NEW.deduplication_id;
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER t_check_dupes_b_i
BEFORE INSERT ON events
   FOR EACH ROW EXECUTE FUNCTION f_check_dupes();

The index on deduplication_id is critical here; without it, every insert triggers a catastrophically slow sequential scan across all partitions. The index enables a much more optimal index scan across all partitions. This is still more overhead than ideal, but it could certainly be worse.

Let's throw two million rows at it and see what happens:

INSERT INTO events (deduplication_id, payload)
SELECT id, format('Hey %s', id)
  FROM generate_series(1, 2000000 - 1) a(id);

Without the trigger, this insert completes in about 7 seconds. With the trigger? 25 seconds. That's more than a 3x overhead, and we only have three partitions. The trigger must probe every partition's index for each row to confirm no duplicate exists. As the partition count grows, so does the probe time, because the query planner has to touch more and more partition indexes. Fifty partitions, a hundred partitions, three hundred partitions... each one adds another index lookup to every single insert.

Does our duplicate check work, though? Absolutely:

INSERT INTO events (deduplication_id, payload)
VALUES (50, 'Yay!');
 
ERROR:  Duplicate found: 50

That's pretty satisfying, but performance at scale has much to be desired. Is that a problem we can solve?

Fake it 'till You Make it

What if instead of scanning the partitioned table, we maintained a separate, unpartitioned table whose sole job is tracking which deduplication_id values already exist? Then we can leverage the primary key of that table to do the uniqueness check for us, and Postgres handles all the heavy lifting with a single B-tree lookup.

CREATE TABLE event_dedup (
  deduplication_id BIGINT PRIMARY KEY
);

Now the trigger changes from a cross-partition scan to a simple insert into the event_dedup (dedup) table. If the insert violates the primary key, Postgres immediately catches the duplicate. No scanning, no partition probing, no existential dread:

CREATE OR REPLACE FUNCTION f_check_dupes()
RETURNS TRIGGER AS
$$
BEGIN
  INSERT INTO event_dedup VALUES (NEW.deduplication_id);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER t_check_dupes_b_i
BEFORE INSERT ON events
   FOR EACH ROW EXECUTE FUNCTION f_check_dupes();

But we also need to keep the dedup table honest. If rows get deleted from events, the corresponding dedup entries should be cleaned up. Otherwise we'd reject future inserts for values that no longer exist:

CREATE OR REPLACE FUNCTION f_clean_dupes()
RETURNS TRIGGER AS
$$
BEGIN
  DELETE FROM event_dedup WHERE deduplication_id = OLD.deduplication_id;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER t_clean_dupes_a_d
AFTER DELETE ON events
   FOR EACH ROW EXECUTE FUNCTION f_clean_dupes();

Now let's run the same two-million-row insert:

INSERT INTO events (deduplication_id, payload)
SELECT id, format('Hey %s', id)
  FROM generate_series(1, 2000000 - 1) a(id);

This time the inserts completed in 14 seconds. Roughly double the baseline of 7 seconds without any trigger. This makes sense, as every insert now performs one additional B-tree insert into the dedup table. On the other hand, that overhead doesn’t change whether there are three partitions or three hundred. The dedup table is a single unpartitioned table with a single index. The cost is constant regardless of how many partitions exist on the events table.

That trade-off is fairly compelling: pay a fixed 2x overhead per insert in exchange for partition-count independence. For a system that might grow to dozens or hundreds of partitions over its lifetime, the dedup table approach is clearly the more sustainable choice.

And yes, the dupe check still works:

INSERT INTO events (deduplication_id, payload)
VALUES (50, 'Yay!');
 
ERROR:  duplicate key value violates unique constraint "event_dedup_pkey"
DETAIL:  Key (deduplication_id)=(50) already exists.

Even better, we get Postgres's own constraint violation message now, complete with the offending key value. No custom error formatting necessary.

Feedback Loop

There's a natural concern with this approach: the dedup table itself could grow enormous. If the events table accumulates billions of rows over time, the dedup table will have billions of entries as well. That's a lot of B-tree to maintain within a monolithic index.

The solution is almost comically recursive: partition the dedup table:

CREATE TABLE event_dedup (
  deduplication_id BIGINT PRIMARY KEY
) PARTITION BY HASH (deduplication_id);
 
CREATE TABLE event_dedup_h0
PARTITION OF event_dedup
FOR VALUES WITH (modulus 3, remainder 0);
 
CREATE TABLE event_dedup_h1
PARTITION OF event_dedup
FOR VALUES WITH (modulus 3, remainder 1);
 
CREATE TABLE event_dedup_h2
PARTITION OF event_dedup
FOR VALUES WITH (modulus 3, remainder 2);

Hash partitioning is well suited for this because it distributes values evenly and Postgres can prune directly to the correct partition for any given deduplication_id. The primary key constraint works here because, well, the partition key is the column we're constraining. No uniqueness problem on the uniqueness-enforcement table. Very convenient.

The trigger code doesn't change at all. Postgres handles the partition routing transparently. We've effectively built a scalable uniqueness enforcement layer using nothing but declarative partitioning, a trigger function, and a table that exists purely as an optimization.

Going Virtual

What if the deduplication ID is something that we can derive from the row's content rather than storing in the events table? That's worth yet another tweak to our approach. Say the uniqueness guarantee is based on the payload content. We could use a hash function to generate the dedup key on the fly:

CREATE OR REPLACE FUNCTION f_check_dupes()
RETURNS TRIGGER AS
$$
DECLARE
  -- Undocumented hash functions are fun!
  n_dupe_id BIGINT := hashtextextended(NEW.payload, 0);
BEGIN
  INSERT INTO event_dedup VALUES (n_dupe_id);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

The hashtextextended function is one of those perilous corners of Postgres that doesn't show up in casual documentation browsing. It produces a 64-bit hash of the input text, which delivers a collision space of about 18 quintillion values. For most practical purposes, that's unique enough. For deduplication requirements that are truly ironclad, stick with a deterministic identifier rather than a hash.

The beauty of this approach is that the events table stays lean. The dedup table absorbs the storage cost, and since it's just a single BIGINT column, it's remarkably compact.

The Fine Print

No solution is without caveats, and intellectual honesty demands we acknowledge a few.

First, triggers add complexity. Every trigger is a piece of business logic that lives in the database rather than the application. Some teams are fine with that; others treat it like putting ketchup on a steak. Organizations with strong opinions about keeping logic in the application layer may balk at using a trigger in this situation regardless of its technical merits.

Second, concurrency deserves attention. Both trigger approaches rely on seeing committed data (or data visible within the current transaction). Under high concurrency, two transactions could simultaneously check for the same deduplication_id, both find nothing, and both proceed with the insert.

The dedup table approach handles this more gracefully because the INSERT INTO event_dedup will block on the primary key's underlying unique index, causing the second transaction to wait. Once the first commits, the second will get the constraint violation.

The scan-based trigger is more vulnerable here because PERFORM might not see uncommitted rows from other transactions. The trigger would need to use more sophisticated logic including advisory locks to prevent that.

Finally, there's the matter of updates. If it's possible to update deduplication_id, there should be additional trigger logic to handle that use case. This is territory where implementation details tied to business logic proliferate and escape the simple confines of our demonstration case. But you get the idea.

Closing Thoughts

So where does this leave us? The documentation is refreshingly honest about the constraint limitation, but honesty doesn't enforce uniqueness across partitions.

The brute force trigger works without much fanfare, but it scales poorly with partition count. The dedup table approach trades a small, constant overhead for partition-count independence, which is almost always the right trade-off in production systems. And for those feeling adventurous, the algorithmic variant omits the extra column entirely by computing dedup keys on the fly.

Sadly, none of these techniques are as clean as a native UNIQUE constraint. They all involve triggers, which means more moving parts, more things to test, and more things to document. But they work. Sometimes the best engineering isn't about finding the perfect solution, it's about finding the most tolerable compromise.

Keep these tricks in your back pocket. Partitioning is still the best approach for horizontally scaling large data sets, and the uniqueness problem isn't going anywhere until Postgres adds global indexes. Until then, triggers and dedup tables will do just fine.