Sequences v5
Many applications require that unique surrogate ids be assigned to database entries.
Often the database SEQUENCE
object is used to produce these. In
PostgreSQL, these can be either:
- A manually created sequence using the
CREATE SEQUENCE
command and retrieved by calling thenextval()
function serial
andbigserial
columns or, alternatively,GENERATED BY DEFAULT AS IDENTITY
columns
However, standard sequences in PostgreSQL aren't multi-node aware and
produce values that are unique only on the local node. This is important because
unique ids generated by such sequences cause conflict and data loss (by
means of discarded INSERT
actions) in multi-master replication.
BDR global sequences
For this reason, BDR provides an application-transparent way to generate unique ids using sequences on bigint or bigserial datatypes across the whole BDR group, called global sequences.
BDR global sequences provide an easy way for applications to use the database to generate unique synthetic keys in an asynchronous distributed system that works for most—but not necessarily all—cases.
Using BDR global sequences allows you to avoid the problems with insert
conflicts. If you define a PRIMARY KEY
or UNIQUE
constraint on a column
that's using a global sequence, no node can ever get
the same value as any other node. When BDR synchronizes inserts between the
nodes, they can never conflict.
BDR global sequences extend PostgreSQL sequences, so they are crash-safe. To use
them, you must be granted the bdr_application
role.
There are various possible algorithms for global sequences:
- SnowflakeId sequences
- Globally allocated range sequences
SnowflakeId sequences generate values using an algorithm that doesn't require inter-node communication at any point. It's faster and more robust and has the useful property of recording the timestamp at which the values were created.
SnowflakeId sequences have the restriction that they work only for 64-bit BIGINT datatypes and produce values 19 digits long, which might be too long for use in some host language datatypes such as Javascript Integer types. Globally allocated sequences allocate a local range of values that can be replenished as needed by inter-node consensus, making them suitable for either BIGINT or INTEGER sequences.
You can create a global sequence using the bdr.alter_sequence_set_kind()
function. This function takes a standard PostgreSQL sequence and marks it as
a BDR global sequence. It can also convert the sequence back to the standard
PostgreSQL sequence.
BDR also provides the configuration variable bdr.default_sequence_kind
, which
determines the kind of sequence to create when the CREATE SEQUENCE
command is executed or when a serial
, bigserial
, or
GENERATED BY DEFAULT AS IDENTITY
column is created. Valid settings are:
local
, meaning that newly created sequences are the standard PostgreSQL (local) sequences.galloc
, which always creates globally allocated range sequences.snowflakeid
, which creates global sequences for BIGINT sequences that consist of time, nodeid, and counter components. You can't use it with INTEGER sequences (so you can use it forbigserial
but not forserial
).timeshard
, which is the older version of SnowflakeId sequence and is provided for backward compatibility only. The SnowflakeId is preferred.distributed
(the default), which is a special value that you can use only forbdr.default_sequence_kind
. It selectssnowflakeid
forint8
sequences (i.e.,bigserial
) andgalloc
sequence forint4
(i.e.,serial
) andint2
sequences.
The bdr.sequences
view shows information about individual sequence kinds.
currval()
and lastval()
work correctly for all types of global sequence.
SnowflakeId sequences
The ids generated by SnowflakeId sequences are loosely time ordered so you can use them to get the approximate order of data insertion, like standard PostgreSQL sequences. Values generated within the same millisecond might be out of order, even on one node. The property of loose time ordering means they are suitable for use as range partition keys.
SnowflakeId sequences work on one or more nodes and don't require any inter-node communication after the node join process completes. So you can continue to use them even if there's the risk of extended network partitions. They aren't affected by replication lag or inter-node latency.
SnowflakeId sequences generate unique ids in a different way from standard sequences. The algorithm uses three components for a sequence number. The first component of the sequence is a timestamp at the time of sequence number generation. The second component of the sequence number is the unique id assigned to each BDR node, which ensures that the ids from different nodes are always different. The third component is the number generated by the local sequence.
While adding a unique node id to the sequence number is enough to ensure there are no conflicts, we also want to keep another useful property of sequences. The ordering of the sequence numbers roughly corresponds to the order in which data was inserted into the table. Putting the timestamp first ensures this.
A few limitations and caveats apply to SnowflakeId sequences.
SnowflakeId sequences are 64 bits wide and need a bigint
or bigserial
.
Values generated are at least 19 digits long.
There's no practical 32-bit integer
version, so you can't use it with serial
sequences. Use globally allocated range sequences instead.
For SnowflakeId there's a limit of 4096 sequence values generated per millisecond on any given node (about 4 million sequence values per second). In case the sequence value generation wraps around within a given millisecond, the SnowflakeId sequence waits until the next millisecond and gets a fresh value for that millisecond.
Since SnowflakeId sequences encode timestamps into the sequence value, you can generate new sequence values only within the given time frame (depending on the system clock). The oldest timestamp that you can use is 2016-10-07, which is the epoch time for the SnowflakeId. The values wrap to negative values in the year 2086 and completely run out of numbers by 2156.
Since timestamp is an important part of a SnowflakeId sequence, there's additional protection from generating sequences with a timestamp older than the latest one used in the lifetime of a postgres process (but not between postgres restarts).
The INCREMENT
option on a sequence used as input for SnowflakeId sequences is
effectively ignored. This might be relevant for applications that do sequence
ID caching, like many object-relational mapper (ORM) tools, notably Hibernate.
Because the sequence is time based, this has little practical effect since the
sequence advances to a new noncolliding value by the time the
application can do anything with the cached values.
Similarly, you might change the START
, MINVALUE
, MAXVALUE
, and CACHE
settings
on the underlying sequence, but there's no benefit to doing
so. The sequence's low 14 bits are used and the rest is discarded, so
the value range limits don't affect the function's result. For the same
reason, setval()
isn't useful for SnowflakeId sequences.
Timeshard sequences
Timeshard sequences are provided for backward compatibility with existing installations but aren't recommended for new application use. We recommend using the SnowflakeId sequence instead.
Timeshard is very similar to SnowflakeId but has different limits and fewer protections and slower performance.
The differences between timeshard and SnowflakeId are as following:
- Timeshard can generate up to 16384 per millisecond (about 16 million per
second), which is more than SnowflakeId. However, there's no protection
against wraparound within a given millisecond. Schemas using the timeshard
sequence must protect the use of the
UNIQUE
constraint when using timeshard values for given column. - The timestamp component of timeshard sequence runs out of values in the year 2050 and, if used in combination with bigint, the values wrap to negative numbers in the year 2033. This means that sequences generated after 2033 have negative values. This is a considerably shorter time span than SnowflakeId and is the main reason why SnowflakeId is preferred.
- Timeshard sequences require occasional disk writes (similar to standard local sequences). SnowflakeIds are calculated in memory so the SnowflakeId sequences are in general a little faster than timeshard sequences.
Globally allocated range sequences
The globally allocated range (or galloc
) sequences allocate ranges (chunks)
of values to each node. When the local range is used up, a new range is
allocated globally by consensus amongst the other nodes. This uses the key
space efficiently but requires that the local node be connected to a majority
of the nodes in the cluster for the sequence generator to progress when the
currently assigned local range is used up.
Unlike SnowflakeId sequences, galloc
sequences support all sequence data types
provided by PostgreSQL: smallint
, integer
, and bigint
. This means that
you can use galloc
sequences in environments where 64-bit sequences are
problematic. Examples include using integers in javascript, since that supports only
53-bit values, or when the sequence is displayed on output with limited space.
The range assigned by each voting is currently predetermined based on the datatype the sequence is using:
- smallint — 1 000 numbers
- integer — 1 000 000 numbers
- bigint — 1 000 000 000 numbers
Each node allocates two chunks of seq_chunk_size, one for the current use plus a reserved chunk for future usage, so the values generated from any one node increase monotonically. However, viewed globally, the values generated aren't ordered at all. This might cause a loss of performance due to the effects on b-tree indexes and typically means that generated values aren't useful as range partition keys.
The main downside of the galloc
sequences is that once the assigned range is
used up, the sequence generator has to ask for consensus about the next range
for the local node that requires inter-node communication. This could
lead to delays or operational issues if the majority of the BDR group isn't
accessible. This might be avoided in later releases.
The CACHE
, START
, MINVALUE
, and MAXVALUE
options work correctly
with galloc
sequences. However, you need to set them before transforming
the sequence to the galloc
kind. The INCREMENT BY
option also works
correctly. However, you can't assign an increment value that's equal
to or more than the above ranges assigned for each sequence datatype.
setval()
doesn't reset the global state for galloc
sequences; don't use it.
A few limitations apply to galloc
sequences. BDR tracks galloc
sequences in a
special BDR catalog bdr.sequence_alloc. This
catalog is required to track the currently allocated chunks for the galloc
sequences. The sequence name and namespace is stored in this catalog. Since the
sequence chunk allocation is managed by Raft, whereas any changes to the
sequence name/namespace is managed by the replication stream, BDR currently doesn't
support renaming galloc
sequences or moving them to another namespace or
renaming the namespace that contains a galloc
sequence. Be
mindful of this limitation while designing application schema.
Converting a local sequence to a galloc sequence
Before transforming a local sequence to galloc, you need to take care of several prerequisites.
1. Verify that sequence and column data type match
Check that the sequence's data type matches the data type of the column with
which it will be used. For example, you can create a bigint
sequence
and assign an integer
column's default to the nextval()
returned by that
sequence. With galloc sequences, which for bigint
are allocated in blocks of
1 000 000 000, this quickly results in the values returned by nextval()
exceeding the int4
range if more than two nodes are in use.
The following example shows what can happen:
After executing INSERT INTO seqtest VALUES(DEFAULT)
on two nodes, the table
contains the following values:
However, attempting the same operation on a third node fails with an
integer out of range
error, as the sequence generated the value
4000000002
.
Tip
You can retrieve the current data type of a sequence from the PostgreSQL
pg_sequences
view. You can modify the data type of a sequence with ALTER SEQUENCE ... AS ...
,
for example, ALTER SEQUENCE public.sequence AS integer
, as long as its current
value doesn't exceed the maximum value of the new data type.
2. Set a new start value for the sequence
When the sequence kind is altered to galloc
, it's rewritten and restarts from
the defined start value of the local sequence. If this happens on an existing
sequence in a production database, you need to query the current value and
then set the start value appropriately. To assist with this use case, BDR
allows users to pass a starting value with the function bdr.alter_sequence_set_kind()
.
If you're already using offset and you have writes from multiple nodes, you
need to check what is the greatest used value and restart the sequence at least
to the next value.
Since users can't lock a sequence, you must leave a $MARGIN
value to allow
operations to continue while the max()
value is queried.
The bdr.sequence_alloc
table gives information on the chunk size and the
ranges allocated around the whole cluster.
In this example, we started our sequence from 333
, and we have two nodes in the
cluster. We can see that we have a number of allocation 4, which is 2 per node,
and the chunk size is 1000000 that's related to an integer sequence.
To see the ranges currently assigned to a given sequence on each node, use these queries:
- Node
Node1
is using range from333
to2000333
.
- Node
Node2
is using range from2000004
to4000003
.
You can't combine it to a single query (like WHERE ctid IN ('(0,2)', '(0,3)')
)
as that still shows only the first range.
When a node finishes a chunk, it asks a consensus for a new one and gets the first available. In the example, it's from 4000334 to 5000333. This is the new reserved chunk and starts to consume the old reserved chunk.
UUIDs, KSUUIDs, and other approaches
There are other ways to generate globally unique ids without using the global sequences that can be used with BDR. For example:
- UUIDs and their BDR variant, KSUUIDs
- Local sequences with a different offset per node (i.e., manual)
- An externally coordinated natural key
BDR applications can't use other methods safely:
counter-table-based approaches relying on SELECT ... FOR UPDATE
, UPDATE ... RETURNING ...
or similar for sequence generation doesn't work correctly in BDR because BDR
doesn't take row locks between nodes. The same values are generated on
more than one node. For the same reason, the usual strategies for "gapless"
sequence generation don't work with BDR. In most cases, the application
coordinates generation of sequences that must be gapless from some external
source using two-phase commit. Or it generates them only on one node in
the BDR group.
UUIDs and KSUUIDs
UUID
keys instead avoid sequences entirely and
use 128-bit universal unique identifiers. These are random
or pseudorandom values that are so large that it's nearly
impossible for the same value to be generated twice. There's
no need for nodes to have continuous communication when using UUID
keys.
In the unlikely event of a collision, conflict detection
chooses the newer of the two inserted records to retain. Conflict logging,
if enabled, records such an event. However, it's
exceptionally unlikely to ever occur, since collisions
become practically likely only after about 2^64
keys are generated.
The main downside of UUID
keys is that they're somewhat inefficient in terms of space and
the network. They consume more space not only as a primary key but
also where referenced in foreign keys and when transmitted on the wire.
Also, not all applications cope well with UUID
keys.
BDR provides functions for working with a K-Sortable variant of UUID
data,
known as KSUUID, which generates values that can be stored using the PostgreSQL
standard UUID
data type. A KSUUID
value is similar to UUIDv1
in that
it stores both timestamp and random data, following the UUID
standard.
The difference is that KSUUID
is K-Sortable, meaning that it's weakly
sortable by timestamp. This makes it more useful as a database key as it
produces more compact btree
indexes, which improves
the effectiveness of search, and allows natural time-sorting of result data.
Unlike UUIDv1
,
KSUUID
values don't include the MAC of the computer on which they were
generated, so there are no security concerns from using them.
KSUUID
v2 is now recommended in all cases. You can directly sort values generated
with regular comparison operators.
There are two versions of KSUUID
in BDR: v1 and v2.
The legacy KSUUID
v1 is
deprecated but is kept in order to support existing installations. Don't
use it for new installations.
The internal contents of v1 and v2 aren't compatible. As such, the
functions to manipulate them also aren't compatible. The v2 of KSUUID
also
no longer stores the UUID
version number.
Step and offset sequences
In offset-step sequences, a normal PostgreSQL sequence is used on each node. Each sequence increments by the same amount and starts at differing offsets. For example, with step 1000, node1's sequence generates 1001, 2001, 3001, and so on. node2's sequence generates 1002, 2002, 3002, and so on. This scheme works well even if the nodes can't communicate for extended periods. However, the designer must specify a maximum number of nodes when establishing the schema, and it requires per-node configuration. Mistakes can easily lead to overlapping sequences.
It's relatively simple to configure this approach with BDR by creating the desired sequence on one node, like this:
Then, on each node calling setval()
, give each node a different offset
starting value, for example:
Be sure to allow a large enough INCREMENT
to leave room for all
the nodes you might ever want to add, since changing it in future is difficult
and disruptive.
If you use bigint
values, there's no practical concern about key exhaustion,
even if you use offsets of 10000 or more. It would take hundreds of years,
with hundreds of machines, doing millions of inserts per second, to have any
chance of approaching exhaustion.
BDR doesn't currently offer any automation for configuration of the per-node offsets on such step/offset sequences.
Composite keys
A variant on step/offset sequences is to use a composite key composed of
PRIMARY KEY (node_number, generated_value)
, where the
node number is usually obtained from a function that returns a different
number on each node. You can create such a function by temporarily
disabling DDL replication and creating a constant SQL function. Alternatively, you can use
a one-row table that isn't part of a replication set to store a different
value in each node.
Global sequence management interfaces
BDR provides an interface for converting between a standard PostgreSQL sequence and the BDR global sequence.
The following functions are considered to be DDL
, so DDL replication
and global locking applies to them.
bdr.alter_sequence_set_kind
Allows the owner of a sequence to set the kind of a sequence.
Once set, seqkind
is visible only by way of the bdr.sequences
view.
In all other ways, the sequence appears as a normal sequence.
BDR treats this function as DDL
, so DDL replication and global locking applies,
if it's currently active. See DDL Replication.
Synopsis
Parameters
seqoid
— Name or Oid of the sequence to alter.seqkind
—local
for a standard PostgreSQL sequence,snowflakeid
orgalloc
for globally unique BDR sequences, ortimeshard
for legacy globally unique sequence.
Notes
When changing the sequence kind to galloc
, the first allocated range for that
sequence uses the sequence start value as the starting point. When there are
existing values that were used by the sequence before it was changed to galloc
,
we recommend moving the starting point so that the newly generated
values don't conflict with the existing ones using the following command:
This function uses the same replication mechanism as DDL
statements. This means
that the replication is affected by the ddl filters
configuration.
The function takes a global DDL
lock. It also locks the sequence locally.
This function is transactional. You can roll back the effects with the
ROLLBACK
of the transaction. The changes are visible to the current
transaction.
Only the owner of the sequence can execute the bdr.alter_sequence_set_kind
function
unless bdr.backwards_compatibility
is
set is set to 30618 or lower.
bdr.extract_timestamp_from_snowflakeid
This function extracts the timestamp component of the snowflakeid
sequence.
The return value is of type timestamptz.
Synopsis
Parameters
snowflakeid
— Value of a snowflakeid sequence.
Notes
This function executes only on the local node.
bdr.extract_nodeid_from_snowflakeid
This function extracts the nodeid component of the snowflakeid
sequence.
Synopsis
Parameters
snowflakeid
— Value of a snowflakeid sequence.
Notes
This function executes only on the local node.
bdr.extract_localseqid_from_snowflakeid
This function extracts the local sequence value component of the snowflakeid
sequence.
Synopsis
Parameters
snowflakeid
— Value of a snowflakeid sequence.
Notes
This function executes only on the local node.
bdr.timestamp_to_snowflakeid
This function converts a timestamp value to a dummy snowflakeid sequence value.
This is useful for doing indexed searches or comparisons of values in the snowflakeid column and for a specific timestamp.
For example, given a table foo
with a column id
that's using a snowflakeid
sequence, we can get the number of changes since yesterday midnight like this:
A query formulated this way uses an index scan on the column id
.
Synopsis
Parameters
ts
— Timestamp to use for the snowflakeid sequence generation.
Notes
This function executes only on the local node.
bdr.extract_timestamp_from_timeshard
This function extracts the timestamp component of the timeshard
sequence.
The return value is of type timestamptz.
Synopsis
Parameters
timeshard_seq
— Value of a timeshard sequence.
Notes
This function executes only on the local node.
bdr.extract_nodeid_from_timeshard
This function extracts the nodeid component of the timeshard
sequence.
Synopsis
Parameters
timeshard_seq
— Value of a timeshard sequence.
Notes
This function executes only on the local node.
bdr.extract_localseqid_from_timeshard
This function extracts the local sequence value component of the timeshard
sequence.
Synopsis
Parameters
timeshard_seq
— Value of a timeshard sequence.
Notes
This function executes only on the local node.
bdr.timestamp_to_timeshard
This function converts a timestamp value to a dummy timeshard sequence value.
This is useful for doing indexed searches or comparisons of values in the timeshard column and for a specific timestamp.
For example, given a table foo
with a column id
that's using a timeshard
sequence, we can get the number of changes since yesterday midnight like this:
A query formulated this way uses an index scan on the column id
.
Synopsis
Parameters
ts
— Timestamp to use for the timeshard sequence generation.
Notes
This function executes only on the local node.
KSUUID v2 Functions
Functions for working with KSUUID
v2 data, K-Sortable UUID data.
bdr.gen_ksuuid_v2
This function generates a new KSUUID
v2 value using the value of timestamp passed as an
argument or current system time if NULL is passed.
If you want to generate KSUUID automatically using the system time, pass a NULL argument.
The return value is of type UUID.
Synopsis
Notes
This function executes only on the local node.
bdr.ksuuid_v2_cmp
This function compares the KSUUID
v2 values.
It returns 1 if the first value is newer, -1 if the second value is lower, or zero if they are equal.
Synopsis
Parameters
UUID
—KSUUID
v2 to compare.
Notes
This function executes only on the local node.
bdr.extract_timestamp_from_ksuuid_v2
This function extracts the timestamp component of KSUUID
v2.
The return value is of type timestamptz.
Synopsis
Parameters
UUID
—KSUUID
v2 value to extract timestamp from.
Notes
This function executes only on the local node.
KSUUID v1 functions
Functions for working with KSUUID
v1 data, K-Sortable UUID data(v1).
bdr.gen_ksuuid
This function generates a new KSUUID
v1 value, using the current system time.
The return value is of type UUID.
Synopsis
Notes
This function executes only on the local node.
bdr.uuid_v1_cmp
This function compares the KSUUID
v1 values.
It returns 1 if the first value is newer, -1 if the second value is lower, or zero if they are equal.
Synopsis
Notes
This function executes only on the local node.
Parameters
UUID
—KSUUID
v1 to compare.
bdr.extract_timestamp_from_ksuuid
This function extracts the timestamp component of KSUUID
v1 or UUIDv1
values.
The return value is of type timestamptz.
Synopsis
Parameters
UUID
—KSUUID
v1 value to extract timestamp from.
Notes
This function executes on the local node.