Looking Forward to Postgres 19: Checksums For All

Data checksums are one of those Postgres features that, when they are doing their job, are easily forgotten. They sit quietly in the header of every data page as a small integer fingerprint, forever waiting to thwart the threat of cosmic rays or errant hardware failures. Most clusters run from cradle to grave and never trip a single one.

For years, that decision was etched in stone at the time of database initialization. It wasn't until version 12 that Postgres introduced the pg_checksums utility to change it. And even then, doing so is a fully offline affair, grinding through every page on disk and incurring a long outage window.

That's a fairly painful ordeal for a basic safeguard that wasn't even enabled by default until version 18. So why go through all the trouble in the first place? Do we really need data checksums in our Postgres cluster? The short answer is "yes". The longer answer explains why Postgres 19 continues to improve the checksum system by adding an online conversion capability.

Bit Rot Never Sleeps

Let's start with what a checksum actually defends against. Postgres is very good at protecting data from itself. Crash recovery, the write-ahead log, full page writes, all of that machinery exists to make sure a power failure mid-write doesn't leave a torn, half-updated page behind. But Postgres can’t help when the hardware itself lies about the data.

Even with ECC RAM, that happens more often than you might expect. Cosmic rays can flip a bit in a memory cell. Failing drives may return stale sectors. Storage controllers could acknowledge writes that never made it to a platter. Any piece of the hardware, including the motherboard, CPU, and RAM, is suspect. In every one of these cases, Postgres asks for a page and the OS cheerfully returns something with no enduring validation. The data is just wrong, and nothing in the normal read path would ever know.

A data checksum closes that gap. When checksums are enabled, every page written to disk carries a 16-bit checksum in the header computed from its contents. When that page is later read back into shared buffers, Postgres recomputes the checksum and compares. If the stored value and the computed value disagree, the page has changed underneath the database without the database's knowledge, and Postgres raises an error instead of returning garbage:

ERROR:  invalid page in block 4711 of relation base/16384/24576

That error is the difference between discovering corruption upon interacting with an affected page, and discovering it months later when the rot has propagated into replicas and backups. The full details live in the Data Checksums chapter of the documentation, and the short version is this: checksums hang a cowbell onto previously silent corruption.

But wait, there's more!

Be Kind, Rewind

The second, sneakier motivation for enabling checksums is for the sake of pg_rewind. When running a high-availability Postgres cluster, a switchover or failover event means the old primary node eventually gets repurposed as a replica. The easiest and fastest way to do this is to compare the state of the new and old primary, and "rewind" differences until they're compatible.

But pg_rewind has a prerequisite:

pg_rewind requires that the target server either has the wal_log_hints option enabled in postgresql.conf or data checksums enabled when the cluster was initialized with initdb (the default). full_page_writes must also be set to on, but is enabled by default.

So checksums are one of two ways to satisfy pg_rewind. The other is a parameter called wal_log_hints. As a subject, hint bits are basically boring Postgres bookkeeping meant for optimization. The important part is that, as an "optimization" which can change during a read, they are not logged to the WAL by default.

But as we all know, checksums are highly dependent on page contents. Any change to its bytes, including a flipped hint bit, also affects the checksum. So when checksums are on, Postgres has no choice but to WAL-log the full page—even for a humble hint bit update—so that recovery and replication stay consistent. Both roads lead to the same destination, which is why either one satisfies pg_rewind.

It’s still easier to simply enable wal_log_hints if we only want rewind functionality, but checksums get us additional safeguards. With that in mind, how do we actually turn them on? Perhaps that process can explain why more DBAs avoid checksums than we would otherwise expect.

Born at Creation

Checksums have been available since Postgres 9.3, but only as a flag to initdb at the moment of cluster creation. Past versions of Postgres worked like this:

initdb --data-checksums -D /var/lib/postgresql/data

Omit the flag, and the cluster remained checksum-free for life. That initial decision baked checksums into the data directory before we ever wrote a byte. Postgres 18 changed the default for this, but that only helps brand new clusters. Every cluster created before that, and every one created with the old default, was on its own.

A partial reprieve arrived in stages. Postgres 11 shipped pg_verify_checksums, which could at least scan an offline cluster and determine whether existing checksums were intact. Then Postgres 12 renamed the utility to pg_checksums and taught it the two verbs everyone actually wanted: --enable and --disable. Finally, a way to add checksums to a cluster after it was born:

pg_checksums --enable -D /var/lib/postgresql/data

Unfortunately, there was one major catch. Read the very first line of the requirements:

The server must be shut down cleanly before running pg_checksums.

The tool requires the cluster to be completely offline. Since the writable primary is the source of all truth, that means replicas would need to be rebuilt once the process completed as well. Then pg_checksums rewrites every single page of every single relation with its freshly computed checksum. On a modest database that's a mere inconvenience. On a multi-terabyte production system, that could result in hours or even days of downtime.

And that assumes everything went as expected. If not, there's no partial credit here. An interrupted conversion process cannot resume. The cluster's checksum state simply remains unchanged, and we get to start over from the beginning. DBAs using this utility were probably praying to the Database Gods™ for salvation. Is it any wonder that so many existing Postgres clusters still lack checksums?

The only real workaround is to create a new cluster with checksums enabled, and then use logical subscriptions to move the data over. This procedure is a kind of Swiss Army knife which can also ease major version upgrades, collation changes, and other major conversions where it's simply easier to abandon the old cluster rather than modify it. But that's not exactly a simple proposition either.

Thankfully, there's now an alternative.

Behind the Scenes

Let's finally cut to the chase: Postgres 19 makes it possible to toggle checksums on a running cluster. Better yet, it's just a single function call:

SELECT pg_enable_data_checksums();

That single call kicks off the whole process in the background. Under the hood, Postgres spawns a worker for each database in the cluster. Each worker walks through every page, marking buffers dirty so that a valid checksum gets written on the next disk flush. The work is WAL-logged and replicated, so standbys are also implicitly converted. Just start the job and get on with your day.

Processing every page on a busy system is still a lot of I/O, so the function accepts the same kind of throttling knobs that cost-based vacuum uses:

SELECT pg_enable_data_checksums(cost_delay => 10, cost_limit => 500);

The cost_delay and cost_limit arguments tell the background workers to ease off, trading a longer total runtime for less impact on production traffic. If the cluster is quiet, leave them out and it churns through every page as fast as the storage subsystem will allow. It's nice to have options.

But how do we know when it's actually done?

Watching It Happen

This is where a small but incredibly convenient change shows up. The read-only data_checksums parameter has been around forever, but it used to be a plain boolean: checksums were either on or off. Postgres 19 changes that to an enum, exposing the other new states:

SHOW data_checksums;

 data_checksums 
----------------
 inprogress-on

The four possible values are off, inprogress-on, on, and inprogress-off. When being enabled, the cluster moves from off to inprogress-on while the workers do their thing. It only flips to a confident on once every page across every database carries a valid checksum.

Disabling runs the same play in reverse, and it's even quicker:

SELECT pg_disable_data_checksums();

That moves the cluster from on to inprogress-off, and once every active backend has stopped trying to validate pages, it settles into off. Turning checksums off doesn't require rewriting anything, so it's nearly instant; Postgres just stops checking. But really, why would we ever do that?

Final Thoughts

Looking back, it's kind of hard to believe how long this particular wart persisted. Checksums shipped with 9.3 back in September of 2013! The (offline) conversion tool didn't arrive until version 12, six years later. Postgres 18 finally switched to the safer default. And now version 19 makes it possible to change on a running system. The price of admission meant a lot of DBAs simply avoided converting existing clusters, and went without valuable protection due to a questionable default.

Postgres 19 still doesn't make the cost of entry "free", and I wouldn't trust anyone who claimed it did. You still pay in WAL and background I/O during conversion, and the throttle settings aren't just there for decoration. That said, "pay in tunable background I/O" and "pay in a multi-hour outage" are not even in the same universe of impact. One is a mild annoyance, while the other probably relegated countless clusters to remain checksum-free.

So if you've got an older cluster humming away without checksums, worry not! This is the release that removes the final obstruction to embracing full data integrity on such systems. If you have one of these clusters, try the latest Postgres 19 beta on a restored backup in a safe sandbox environment. Check how long the conversion process takes, try it with different IO tuning knobs, and develop a playbook for your production instance. Postgres 19 is coming very soon now, and you'll be ready when it does.