Using Snowflake Sequences with the Hibernate IDENTITY column
A PostgreSQL native sequence is a great way to generate unique IDs, but there is risk of duplicate ID generation if you use it in a distributed cluster environment. The pgEdge Snowflake extension is designed for use in a distributed environment, helping you generate truly unique identifiers with ease.
Snowflake functions create unique identifiers that are designed to support parallel execution across multiple nodes. Using a Snowflake sequence ensures that each generated number is consistent and addresses the need for uniqueness across all of the pgEdge cluster nodes.
This is the first in a series of blogs in which I will share different ways to adapt sequences in the pgEdge cluster environment.
Snowflake Sequence Overview
The Snowflake extension creates a replication-friendly identifier that you can use to replace problematic PostgreSQL sequence definitions in your tables that reside in a distributed replication environment. pgEdge PostgreSQL automatically installs and creates the extension in each pgEdge cluster.
Snowflake sequences let you:
Add or modify data in different regions while ensuring a unique transaction sequence.
Preserve unique transaction identifiers without manual/administrative management of a numbering scheme.
Accurately identify the order in which globally distributed transactions are performed.
Each Snowflake ID is a 64-bit value that is composed of the multiple parts:
Timestamp: 41 bits that contains the number milliseconds since 2023-01-01
Node number: 10 bits identifying a unique node number (set as a PostgreSQL GUC)
Counter Number: 12 bits that increment the value for handling multiple IDs that might be generated in the same millisecond
Hibernate software is an object-relational mapping (ORM) tool for Java developers that helps map Java objects (classes) to database tables, and automatically handles converting data between the two. It’s one of the most popular ORM frameworks in the Java ecosystem and is part of the Java Persistence API (JPA) standard. In our examples, we'll use a Snowflake sequence in a Hibernate environment.
If you're working in a traditional replication environment (without the benefits of multi-master replication), using a Hibernate identity column as an auto-generated primary key implies that the database will automatically generate IDs. However, in distributed systems like pgEdge multi-master replication, Hibernate IDs can lead to collision issues if multiple nodes try to generate the same ID concurrently.
The following code snippets implement Snowflake sequences in a Hibernate environment hosted on a pgEdge cluster:
@Data
@Entity
@Table(name = "weather")
public class Weather implements Serializable {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String city;
private Integer temperature;
}
GenerationType.IDENTITY
relies on PostgreSQL's SERIAL or BIGSERIAL type. Hibernate will create the public.weather
table with a 64 bit ID; in SQL, the table would look like:
CREATE TABLE public.weather
(
id bigint NOT NULL DEFAULT nextval('weather_id_seq'::regclass),
city character varying(255),
temperature integer,
CONSTRAINT weather_pkey PRIMARY KEY (id)
)
In a distributed environment, adding data to this table would have the potential to create conflicting identifiers (which in turn result in replication interruption and maintenance overhead). The collision issue can be avoided with minimal changes by adopting Snowflake sequences. As shown in the following code snippets, it's simple to replace the PostgreSQL style sequences with Snowflake sequences; simply:
Retrieve the sequence associated with table column:
SELECT pg_get_serial_sequence('weather', 'id'); "public.weather_id_seq"
Use the snowflake.nextval function to alter the column:
ALTER TABLE weather ALTER COLUMN id SET DEFAULT snowflake.nextval('public.weather_id_seq');
Automating Snowflake Sequence Implementation
You can use a PostgreSQL event trigger to ensure the use of Snowflake sequences when you create a table. The following example defines a PostgreSQL event trigger that automatically replaces column sequence definitions, replacing them with a Snowflake sequence.
After creating this trigger in your PostgreSQL database, any time you run a CREATE TABLE
command, this trigger fires, looking for any ALTER SEQUENCE
syntax that is part of the CREATE TABLE command; if it finds any, it replaces the column's default value (usually nextval('...')
) with snowflake.nextval('...')
.
/*
* Trigger function to adopt Snowflake sequences
*
* In case of CREATE TABLE .. statement event, this function
* will ensure use of snowflake.nextval method for related sequence
*/
CREATE OR REPLACE FUNCTION CREATE_TBL_SNOWFLAKE_CONVERSION_TRG_FUNC ()
RETURNS EVENT_TRIGGER
LANGUAGE PLPGSQ
AS $$
DECLARE
obj record;
seq record;
BEGIN
IF tg_tag = 'CREATE TABLE' THEN
FOR obj IN
SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
IF obj.command_tag = 'ALTER SEQUENCE' THEN
SELECT ns.nspname AS schemaname,
t.oid::regclass AS tablename,
a.attname AS columnname,
s.relname as sequencename
INTO seq
FROM pg_class s
JOIN pg_depend as d ON (s.oid = d.objid)
JOIN pg_class as t ON (d.refobjid = t.oid)
JOIN pg_attribute AS a ON (a.attrelid = t.oid AND d.refobjsubid = a.attnum)
JOIN pg_catalog.pg_namespace AS ns ON t.relnamespace = ns.oid
WHERE s.oid = obj.objid
AND d.deptype IN ('i', 'a')
AND t.relkind IN ('r', 'P')
AND s.relkind = 'S';
IF FOUND THEN
EXECUTE format('ALTER TABLE %I.%I ALTER COLUMN %I SET DEFAULT snowflake.nextval(''%s'');',
seq.schemaname, seq.tablename, seq.columnname, seq.sequencename);
END IF;
END IF;
END LOOP;
END IF;
END
$$;
/*
* Create trigger for CREATE TABLE event
*/
CREATE EVENT TRIGGER create_tbl_snowflake_conversion_trg ON ddl_command_end WHEN TAG IN ('CREATE TABLE')
EXECUTE FUNCTION create_tbl_snowflake_conversion_trg_func();
Conclusion
Snowflake sequences are an easy way to create unique identifiers in distributed clusters. In this blog, we've explained how you can adopt Snowflake sequences with minimal changes to create unique identifiers that work with Hibernate (and the IDENTITY
column) on a pgEdge distributed cluster with minimal risk of collision. This allows you to avoid the issues associated with using traditional identity column generation on a multi-master active-active cluster.
For more information about using pgEdge distributed multi-master replication, visit the pgEdge website or see the pgEdge documentation.