SQLAlchemy versus Distributed Postgres

Wed, 16 Jul 2025 02:55:00 GMT

One of our customers recently asked if they could use their Python application built with SQLAlchemy with pgEdge, and were pleased to learn that they could. But what is SQLAlchemy, and what considerations might there be when working with a distributed multi-master PostgreSQL cluster like pgEdge Distributed Postgres?SQLAlchemy is “the Python SQL Toolkit and Object Relational Mapper” according to its website. Most famously, it is used for its ORM capabilities which allow you to define your data model and to manage the database schema and access from Python, without having to worry about inconveniences like SQL. A good example from my world is pgAdmin, the management tool project for PostgreSQL that I started nearly 30(!) years ago; pgAdmin 4 stores most of its runtime configuration in either a SQLite database, or for larger shared installations, PostgreSQL. Most of the database code for that purpose uses SQLAlchemy both to handle schema creation and upgrades (known as migrations) as it makes it trivial to manage.One of my awesome colleagues, Gil Browdy, took on the task of showing the customer how pgEdge can work in a distributed environment, and started with a simple script. The script shows the very basics of how we might get started working with SQLAlchemy and pgEdge, so let’s take a look at Gil’s example.

Setup

First, we need to get everything set up. We’re going to import the SQLAlchemy library, which we’ll be using with the psycopg PostgreSQL interface for Python, so we need to get them installed into a virtual environment:

Code

With the environment set up we can play with our script. First, the boiler plate to import the SQLAlchemy functions we need:Next, we’ll create connections to each of the three nodes in my pgEdge cluster:
We define an array of connection strings, and then an object for each:We need a table to work with to demonstrate that replication works, so we can define a SQLAlchemy object. This is attached to a object which is a collection that holds all table objects. The tables themselves also contain objects defining each column in which we’ll store data. As this is a test script we’ll also create a simple function to drop and recreate all of our managed tables each time we run the test.Some additional helper functions can be useful to validate whether or not a table or data exists on a given node in the cluster:And last but not least, we need a function to insert some test data. You will note that this does not simply execute a statement (though we could do that by calling a psycopg function directly), but uses a regular Python method invocation on the table object:We’ve set everything up and defined all of our helper functions, so now for the main function. Gil has commented this code nicely, but in a nutshell, we create the table on the first database, and then check to ensure that it exists on the other nodes in the cluster.Note that as we’re using asynchronous replication in pgEdge, this may actually fail if the script runs the check before the Spock replication engine has replicated the DDL statement to the other nodes. That could be solved with the addition of a brief sleep if needed, however in a typical application you would normally only use one node of the cluster so this is really only a potential problem for this test.Assuming the table now exists on all nodes, we insert a row on a node chosen at random and then check that it is replicated to all other nodes.Now this is a somewhat contrived example, and not overly representative of a real world application in which you would almost certainly have affinity to one particular node in the cluster - but it does show how simple it is to setup and use the basics of SQLAlchemy and prove that it functions as expected with a multi-master replicated cluster.

Snowflakes

One important concept this example does not show is how to handle unique identifiers across the cluster. pgEdge uses a Snowflake Sequence extension as a replacement for standard sequences that is designed to ensure that generated values are unique across the cluster. You can learn more about the Snowflake extension in our documentation - in particular, note that it is important to set the configuration parameter (or GUC) for each individual cluster node once the extension has been installed and created in the database.To use the Snowflake sequence, we must additionally import the object and function from SQLAlchemy:Then, we simply modify the schema to first create a regular sequence which will be used by Snowflake, and then set the server default value for the column in our example table. It’s worth noting that we also need to use the (AKA int8) datatype for Snowflake sequences – an (AKA int4) will not be large enough:With these minor modifications, rows will be identified by values from the Snowflake sequence, thus ensuring that there are no sequence value collisions from different nodes in the cluster.

pgEdge PostgreSQL Posts

SQLAlchemy versus Distributed Postgres

Setup

Code

Snowflakes