Looking Forward to Postgres 19: The New REPACK Command

Postgres has had a thorn in its paw for a very long time regarding table size. Every modified tuple leaves an old version in the heap for use by older transactions. While VACUUM locates these old tuples, it only marks them as reusable rather than returning the space to the OS. Tables only ever grow larger in Postgres.

Maybe Postgres 19 can fix that for us.

Options Galore

Historically, the only sanctioned solutions to table bloat have been the VACUUM FULL or CLUSTER commands. Sadly, both of these require an ACCESS EXCLUSIVE lock for the entire duration of the operation. In a world where 1TB tables aren't uncommon, it's a ridiculous imposition that's entirely incompatible with a production system.

So the community turned to third-party utilities like pg_repack and pg_squeeze, which rewrite tables with minimal locking by leveraging logical decoding under the hood. Some DBAs even run these on a regular schedule to keep tables at their ideal size at all times. But more conservative shops have always been wary of handing table rewrites to a project that operates outside the core engine's safety guarantees. I've personally handled support tickets to repair corruption caused by pg_repack. While that may have been several years ago, it's enough to induce extra caution rather than reliance.

Postgres 19 may prove to change things entirely. The new REPACK command brings this functionality into the core engine itself, complete with a CONCURRENTLY option that eliminates the prolonged exclusive lock. And as an added bonus, VACUUM FULL and CLUSTER now use the same repack infrastructure internally, so every path to table compaction benefits from the new code.

An Ounce of Prevention

Before we get too excited about our shiny new tool, let's pull back and talk about not needing it. The best repack is the one you never have to run.

Postgres ships with autovacuum enabled by default, and for many workloads, the defaults are perfectly fine. For tables with high churn, especially those that see large batch deletes or frequent updates, the defaults tend to be insufficient.

The usual fix isn't to schedule periodic repacking, but to tune autovacuum so it keeps pace with the workload. Consider these configuration parameters:

- autovacuum_vacuum_scale_factor: The fraction of the table that must be dead before autovacuum kicks in. The default of 0.2 (20%) is fine for small tables, but on a 100-million-row table, that means 20 million dead tuples can accumulate before cleanup starts. Lowering this default is one option for common use cases. Another trick for very large tables is to reduce this setting to 0 and rely on the next parameter instead.

- autovacuum_vacuum_threshold: The minimum number of dead tuples in addition to the scale factor before autovacuum considers the table. So a table with 1000 rows would need 200 + 50 rows to change before autovacuum kicks in using the defaults. With this in mind, a large table that often sees 50,000 updates per batch might set this to 50,000 in conjunction with setting the scale factor to 0.

- autovacuum_vacuum_cost_limit: How aggressively autovacuum can consume I/O. The default of 200 is conservative. Bumping this up lets autovacuum finish faster, especially on NVMe storage. I tend to recommend starting at 2000.

These can all be set per-table with ALTER TABLE ... SET, which is the right approach for very busy or extremely large tables. When autovacuum is well-tuned, tables reach a stable equilibrium where dead tuples are cleaned up almost as fast as they're created, and bloat never accumulates beyond the scale factor. A few percentage points of extra space is worth avoiding additional maintenance steps.

So tune autovacuum first. But sometimes things go out of control. Perhaps a stale or prepared transaction, or extreme lag while using hot standby replication led to bloat we need to clean up. What then?

The Miracle Cure

Let's see the new REPACK command in action. We'll create a table, fill it with a million rows, delete most of them, and watch what happens:

CREATE TABLE sensor_readings (
  id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  sensor_id   INT NOT NULL,
  reading     NUMERIC(10,4) NOT NULL,
  recorded_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  notes       TEXT
);

CREATE INDEX idx_sensor_readings_sensor
    ON sensor_readings (sensor_id);
CREATE INDEX idx_sensor_readings_time
    ON sensor_readings (recorded_at);

INSERT INTO sensor_readings (sensor_id, reading, recorded_at, notes)
SELECT (random() * 100)::int + 1,
       (random() * 1000)::numeric(10, 4),
       now() - (random() * interval '90 days'),
       'Sensor reading batch ' || (g % 100)::text
  FROM generate_series(1, 1000000) g;

ANALYZE sensor_readings;

One million rows of fake sensor data. Let's check the size:

SELECT pg_size_pretty(pg_relation_size('sensor_readings')) AS table_size,
       pg_size_pretty(pg_indexes_size('sensor_readings')) AS index_size,
       pg_size_pretty(pg_total_relation_size('sensor_readings')) AS total_size;

 table_size | index_size | total_size
------------+------------+------------
 81 MB      | 57 MB      | 138 MB

Now let's simulate a common scenario: a batch job deletes old data, removing about 60% of the rows. We'll run a manual VACUUM afterward, as any well-planned bulk procedure should after such a major purge:

DELETE FROM sensor_readings WHERE id % 5 IN (1, 2, 3);
-- DELETE 600000

VACUUM sensor_readings;

SELECT count(*) AS remaining_rows FROM sensor_readings;

 remaining_rows
----------------
         400000

SELECT pg_size_pretty(pg_relation_size('sensor_readings')) AS table_size,
       pg_size_pretty(pg_indexes_size('sensor_readings')) AS index_size,
       pg_size_pretty(pg_total_relation_size('sensor_readings')) AS total_size;

 table_size | index_size | total_size
------------+------------+------------
 81 MB      | 57 MB      | 138 MB

There it is.

We deleted 600k rows and the table is still 81 MB. The dead tuples are marked as reusable thanks to VACUUM, but the file on disk hasn't budged. We're hauling around 138 MB of total relation size for 400k rows that should be much smaller.

Let's fix that:

REPACK (VERBOSE, ANALYZE) sensor_readings;

INFO:  repacking "public.sensor_readings" in physical order
INFO:  "public.sensor_readings": found 0 removable, 
       400000 nonremovable row versions in 10310 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.07 s, system: 0.04 s, elapsed: 0.13 s.
INFO:  analyzing "public.sensor_readings"
INFO:  "sensor_readings": scanned 4124 of 4124 pages, 
       containing 400000 live rows and 0 dead rows; 30000 rows in sample,
       400000 estimated total rows
INFO:  finished analyzing table "postgres.public.sensor_readings"
       avg read rate: 352.057 MB/s, avg write rate: 0.000 MB/s
       buffer usage: 627 hits, 3560 reads, 0 dirtied
WAL usage: 18 records, 0 full page images, 6353 bytes, 
      0 full page image bytes, 0 buffers full
system usage: CPU: user: 0.07 s, system: 0.00 s, elapsed: 0.07 s
REPACK

SELECT pg_size_pretty(pg_relation_size('sensor_readings')) AS table_size,
       pg_size_pretty(pg_indexes_size('sensor_readings')) AS index_size,
       pg_size_pretty(pg_total_relation_size('sensor_readings')) AS total_size;

 table_size | index_size | total_size
------------+------------+------------
 32 MB      | 20 MB      | 52 MB

The total used size went from 138 MB down to 52 MB, with the table and index size experiencing similar reductions. That space is genuinely returned to the operating system, not just marked for reuse.

The VERBOSE option gives us a progress report at the INFO level, and ANALYZE tells Postgres to update the table's statistics once the repack finishes. Both are optional, but there's no reason not to use them.

No Downtime Required

The basic REPACK command acquires an ACCESS EXCLUSIVE lock for its entire duration, just like the old VACUUM FULL. That's fine for a small table, but a 200 GB production table serving live traffic? Not so much.

That's where CONCURRENTLY comes in:

REPACK (CONCURRENTLY) sensor_readings;

When running concurrently, Postgres creates a replication slot behind the scenes to capture any changes that occur while the table is being rewritten. The ACCESS EXCLUSIVE lock is only held during the brief final step when the old and new files are swapped, similar to how TRUNCATE works. The rest of the operation proceeds while reads and writes continue as normal.

There are a few restrictions worth knowing about. REPACK CONCURRENTLY can't be used on UNLOGGED tables, partitioned tables (though individual partitions are fine), or tables without a primary key or replica identity. It also can't run inside a transaction block. Postgres can also only run five of these concurrent repacks simultaneously by default, unless you increase the max_repack_replication_slots parameter.

Getting Sorted

So far we've been repacking in "physical order," which just means the rows are written sequentially into the new file without any particular sort. But REPACK can also replace the CLUSTER command by reordering rows according to an index:

REPACK (VERBOSE, ANALYZE) sensor_readings
 USING INDEX idx_sensor_readings_sensor;

...
INFO:  repacking "public.sensor_readings" using index scan
       on "idx_sensor_readings_sensor"
...

Why does this matter? When rows are scattered randomly across disk pages, a query that fetches all readings for a single sensor has to pull in pages from all over the table. Before clustering, a query for sensor_id = 42 looks like this:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM sensor_readings WHERE sensor_id = 42;

Bitmap Heap Scan on sensor_readings
  (actual time=0.981..3.175 rows=4027.00 loops=1)
  Recheck Cond: (sensor_id = 42)
  Heap Blocks: exact=3342
  Buffers: shared hit=3352
  ->  Bitmap Index Scan on idx_sensor_readings_sensor  
        (actual time=0.474..0.474 rows=4027.00 loops=1)
        Index Cond: (sensor_id = 42)
        Index Searches: 1
        Buffers: shared hit=10
Planning Time: 0.083 ms
Execution Time: 3.233 ms

The query touched over 3000 heap blocks to retrieve about 4,000 rows. That's because the rows for sensor 42 were sprinkled across nearly every page in the table. They're fetched in index order, but that's a lot of heap seeks.

Here's what it looks like after repacking with USING INDEX:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM sensor_readings WHERE sensor_id = 42;

Index Scan using idx_sensor_readings_sensor on sensor_readings  
  (actual time=0.029..0.522 rows=4027.00 loops=1)
  Index Cond: (sensor_id = 42)
  Index Searches: 1
  Buffers: shared hit=49
Planning Time: 0.087 ms
Execution Time: 0.613 ms

From 3300-ish buffers down to 49. From 3ms down to 0.6ms. The planner even switched from a bitmap scan to a plain index scan because the physical clustering made sequential access so efficient. That's the power of data locality.

One important note: such data reordering is still a one-time operation. Subsequent inserts and updates won't maintain the physical order. Workloads that would benefit from index ordering need periodical REPACK USING INDEX. Alternatively, a lower table fillfactor can leave room in the same page for updated values.

A Word of Caution

There's one caveat that deserves explicit attention: REPACK CONCURRENTLY is not MVCC-safe.

What does that mean in practice? If a concurrent transaction has taken a snapshot before the repack commits, and that transaction hasn't previously accessed the repacked table, it could see the table as empty. This is the same limitation that applies to TRUNCATE and table-rewriting forms of ALTER TABLE. It's a narrow window, and in the TRUNCATE case, getting empty results is essentially the expected outcome anyway. But for REPACK, where you're compacting a table that still has data, it could be surprising.

The good news is that any transaction that accessed the table before the repack started will hold an ACCESS SHARE lock, which blocks the final swap step until that transaction completes. So the risk is limited to transactions that have a snapshot open but haven't touched the table yet. It's rare, but it's worth understanding for long analytical queries that could run alongside maintenance operations. It may be a good idea to check for these before scheduling regular REPACK cycles.

For monitoring, Postgres 19 provides the pg_stat_progress_repack view, which tracks the operation through various operating phases. It even reports tuple counts for scanned, inserted, updated, and deleted rows during the catch-up phase. No more guessing how long a repack will take.

Final Thoughts

For years, the Postgres community's answer to table bloat has been "tune your autovacuum" or perhaps "install pg_repack" for the more adventurous. The first is still the right advice; well-tuned autovacuum prevents bloat from accumulating in the first place, and that should always be the first line of defense. But when reclaiming space is critical, having a first-class REPACK operation in-core is a nice addition.

Postgres 19 is still in development and this feature will go through extensive testing before release, so there's still a chance it could be reverted. But if it survives the grueling validation process, DBAs will have a whole new tool in their kit for keeping operations running smoothly. Sometimes bloat happens, and we could have an easier way out than VACUUM FULL. Keep your fingers crossed!