Conflicts v5
EDB Postgres Distributed is an active/active or multi-master DBMS. If used asynchronously, writes to the same or related rows from multiple different nodes can result in data conflicts when using standard data types.
Conflicts aren't errors. In most cases, they are events that PGD can detect and resolve as they occur. Resolution depends on the nature of the application and the meaning of the data, so it's important that PGD provides the application a range of choices as to how to resolve conflicts.
By default, conflicts are resolved at the row level. When changes from two nodes conflict, either the local or remote tuple is picked and the other is discarded. For example, the commit timestamps might be compared for the two conflicting changes and the newer one kept. This approach ensures that all nodes converge to the same result and establishes commit-order-like semantics on the whole cluster.
Conflict handling is configurable, as described in Conflict resolution. Conflicts can be detected and handled differently for each table using conflict triggers, described in Stream triggers.
Column-level conflict detection and resolution is available with BDR, described in CLCD.
By default, all conflicts are logged to bdr.conflict_history
. If conflicts
are possible, then table owners must monitor for them and analyze how to
avoid them or make plans to handle them regularly as an application task.
The LiveCompare tool is also available to scan regularly for divergence.
Some clustering systems use distributed lock mechanisms to prevent concurrent access to data. These can perform reasonably when servers are very close to each other but can't support geographically distributed applications where very low latency is critical for acceptable performance.
Distributed locking is essentially a pessimistic approach. PGD advocates an optimistic approach, which is to avoid conflicts where possible but allow some types of conflicts to occur and resolve them when they arise.
How conflicts happen
Inter-node conflicts arise as a result of sequences of events that can't happen if all the involved transactions happen concurrently on the same node. Because the nodes exchange changes only after the transactions commit, each transaction is individually valid on the node it committed on. It isn't valid if applied on another node that did other conflicting work at the same time.
Since BDR replication essentially replays the transaction on the other nodes, the replay operation can fail if there's a conflict between a transaction being applied and a transaction that was committed on the receiving node.
Most conflicts can't happen when all transactions run on a single
node because Postgres has inter-transaction communication mechanisms
to prevent it such as UNIQUE
indexes, SEQUENCE
operations, row and relation locking, and
SERIALIZABLE
dependency tracking. All of these mechanisms are ways
to communicate between ongoing transactions to prevent undesirable concurrency
issues.
BDR doesn't have a distributed transaction manager or lock manager. That's part of why it performs well with latency and network partitions. As a result, transactions on different nodes execute entirely independently from each other when using the default, lazy replication. Less independence between nodes can avoid conflicts altogether, which is why BDR also offers Eager Replication for when this is important.
Types of conflict
PRIMARY KEY or UNIQUE conflicts
The most common conflicts are row conflicts, where two operations affect a
row with the same key in ways they can't on a single node. BDR can
detect most of those and applies the update_if_newer
conflict resolver.
Row conflicts include:
INSERT
versusINSERT
UPDATE
versusUPDATE
UPDATE
versusDELETE
INSERT
versusUPDATE
INSERT
versusDELETE
DELETE
versusDELETE
The view bdr.node_conflict_resolvers
provides information on how
conflict resolution is currently configured for all known conflict types.
INSERT/INSERT conflicts
The most common conflict, INSERT
/INSERT
, arises where INSERT
operations on two
different nodes create a tuple with the same PRIMARY KEY
values (or if no
PRIMARY KEY
exists, the same values for a single UNIQUE
constraint).
BDR handles this situation by retaining the most recently inserted tuple of the two according to the originating node's timestamps, unless this behavior is overridden by a user-defined conflict handler.
This conflict generates the insert_exists
conflict type, which is by
default resolved by choosing the newer (based on commit time) row and keeping
only that one (update_if_newer
resolver). You can configure other resolvers.
See Conflict resolution for details.
To resolve this conflict type, you can also use column-level conflict resolution and user-defined conflict triggers.
You can effectively eliminate this type of conflict by using global sequences.
INSERT operations that violate multiple UNIQUE constraints
An INSERT
/INSERT
conflict can violate more than one UNIQUE
constraint
(of which one might be the PRIMARY KEY
). If a new row violates more than
one UNIQUE
constraint and that results in a conflict against more than one
other row, then the apply of the replication change produces a
multiple_unique_conflicts
conflict.
In case of such a conflict, you must remove some rows for replication
to continue. Depending on the resolver setting for multiple_unique_conflicts
,
the apply process either exits with error, skips the incoming row, or deletes
some of the rows. The deletion tries to
preserve the row with the correct PRIMARY KEY
and delete the others.
Warning
In case of multiple rows conflicting this way, if the result of conflict resolution is to proceed with the insert operation, some of the data is always deleted.
It's also possible to define a different behavior using a conflict trigger.
UPDATE/UPDATE conflicts
Where two concurrent UPDATE
operations on different nodes change the same tuple
(but not its PRIMARY KEY
), an UPDATE
/UPDATE
conflict can occur on replay.
These can generate different conflict kinds based on the configuration and
situation. If the table is configured with row version conflict detection,
then the original (key) row is compared with the local row.
If they're different, the update_differing
conflict is generated.
When using Origin conflict detection,
the origin of the row is checked (the origin is the node that the current
local row came from). If that changed, the update_origin_change
conflict
is generated. In all other cases, the UPDATE
is normally applied without
generating a conflict.
Both of these conflicts are resolved the same way as insert_exists
, described in INSERT/INSERT conflicts.
UPDATE conflicts on the PRIMARY KEY
BDR can't currently perform conflict resolution where the PRIMARY KEY
is changed by an UPDATE
operation. You can update the primary
key, but you must ensure that no conflict with existing values is possible.
Conflicts on the update of the primary key are Divergent conflicts and require manual intervention.
Updating a primary key is possible in Postgres, but there are issues in both Postgres and BDR.
A simple schema provides an example that explains:
Updating the Primary Key column is possible, so this SQL succeeds:
However, suppose there are multiple rows in the table:
Some UPDATEs succeed:
Other UPDATEs fail with constraint errors:
So for Postgres applications that update primary keys, be careful to avoid runtime errors, even without BDR.
With BDR, the situation becomes more complex if UPDATEs are allowed from multiple locations at same time.
Executing these two changes concurrently works:
Executing these next two changes concurrently causes
a divergent error, since both changes are accepted. But applying
the changes on the other node results in
update_missing
conflicts.
This scenario leaves the data different on each node:
You can identify and resolve this situation using LiveCompare.
Concurrent conflicts present problems. Executing these two changes concurrently isn't easy to resolve:
Both changes are applied locally, causing a divergence between the nodes. But then apply on the target fails on both nodes with a duplicate key-value violation error, which causes the replication to halt and requires manual resolution.
This duplicate key violation error can now be avoided,
and replication doesn't break if you set the conflict_type
update_pkey_exists
to skip
, update
, or update_if_newer
. This
can still lead to divergence depending on the nature of the update.
You can avoid divergence in cases where the same
old key is being updated by the same new key concurrently by setting
update_pkey_exists
to update_if_newer
. However, in certain situations,
divergence occurs even with update_if_newer
, namely when two different
rows both are updated concurrently to the same new primary key.
As a result, we recommend strongly against allowing primary key UPDATE operations in your applications, especially with BDR. If parts of your application change primary keys, then to avoid concurrent changes, make those changes using Eager Replication.
Warning
In case the conflict resolution of update_pkey_exists
conflict results
in update, one of the rows is always deleted.
UPDATE operations that violate multiple UNIQUE constraints
Like INSERT operations that violate multiple UNIQUE constraints, where an incoming
UPDATE
violates more than one UNIQUE
index (or the PRIMARY KEY
), BDR
raises a multiple_unique_conflicts
conflict.
BDR supports deferred unique constraints. If a transaction can commit on the source, then it applies cleanly on target, unless it sees conflicts. However, a deferred primary key can't be used as a REPLICA IDENTITY, so the use cases are already limited by that and the warning about using multiple unique constraints.
UPDATE/DELETE conflicts
It's possible for one node to update a row that another node simultaneously
deletes. In this case an UPDATE
/DELETE
conflict can occur on replay.
If the deleted row is still detectable (the deleted row wasn't removed by VACUUM
),
the update_recently_deleted
conflict is generated. By default the
UPDATE
is skipped, but you can configure the resolution for this.
See Conflict resolution for details.
The deleted row can be cleaned up from the database by the time the UPDATE
is received in case the local node is lagging behind in replication. In this
case, BDR can't differentiate between UPDATE
/DELETE
conflicts and INSERT/UPDATE conflicts and generates the
update_missing
conflict.
Another type of conflicting DELETE
and UPDATE
is a DELETE
that comes after the row was updated locally. In this situation, the
outcome depends on the type of conflict detection used. When using the
default, origin conflict detection, no conflict is detected at all,
leading to the DELETE
being applied and the row removed. If you enable
row version conflict detection, a delete_recently_updated
conflict is
generated. The default resolution for this conflict type is to apply the
DELETE
and remove the row, but you can configure this or this can be handled by
a conflict trigger.
INSERT/UPDATE conflicts
When using the default asynchronous mode of operation, a node might receive an
UPDATE
of a row before the original INSERT
was received. This can
happen only with three or more nodes being active (see Conflicts with three or more nodes).
When this happens, the update_missing
conflict is generated. The default
conflict resolver is insert_or_skip
, though you can use insert_or_error
or skip
instead. Resolvers that do insert-or-action first
try to INSERT
a new row based on data
from the UPDATE
when possible (when the whole row was received). For the
reconstruction of the row to be possible, the table either needs to have
REPLICA IDENTITY FULL
or the row must not contain any toasted data.
See TOAST support details for more info about toasted data.
INSERT/DELETE conflicts
Similar to the INSERT
/UPDATE
conflict, the node might also receive a
DELETE
operation on a row for which it didn't yet receive an INSERT
. This
is again possible only with three or more nodes set up (see Conflicts with three or more nodes).
BDR can't currently detect this conflict type. The INSERT
operation
doesn't generate any conflict type and the INSERT
is applied.
The DELETE
operation always generates a delete_missing
conflict, which
is by default resolved by skipping the operation.
DELETE/DELETE conflicts
A DELETE
/DELETE
conflict arises when two different nodes concurrently
delete the same tuple.
This always generates a delete_missing
conflict, which is by default
resolved by skipping the operation.
This conflict is harmless since both DELETE
operations have the same effect. One
of them can be safely ignored.
Conflicts with three or more nodes
If one node inserts a row that is then replayed to a second node and updated
there, a third node can receive the UPDATE
from the second node before it
receives the INSERT
from the first node. This scenario is an INSERT
/UPDATE
conflict.
These conflicts are handled by discarding the UPDATE
. This can lead to
different data on different nodes. These are [divergent conflicts](#divergent conflicts).
This conflict type can happen only with three or more masters, of which at least two must be actively writing.
Also, the replication lag from node 1 to node 3 must be high enough to allow the following sequence of actions:
- node 2 receives INSERT from node 1
- node 2 performs UPDATE
- node 3 receives UPDATE from node 2
- node 3 receives INSERT from node 1
Using insert_or_error
(or in some cases the insert_or_skip
conflict resolver
for the update_missing
conflict type) is a viable mitigation strategy for
these conflicts. However, enabling this option opens the door for
INSERT
/DELETE
conflicts:
- node 1 performs UPDATE
- node 2 performs DELETE
- node 3 receives DELETE from node 2
- node 3 receives UPDATE from node 1, turning it into an INSERT
If these are problems, we recommend tuning freezing settings for a table
or database so that they are correctly detected as update_recently_deleted
.
Another alternative is to use Eager Replication to prevent these conflicts.
INSERT
/DELETE
conflicts can also occur with three or more nodes.
Such a conflict is identical to INSERT
/UPDATE
except with the
UPDATE
replaced by a DELETE
. This can result in a delete_missing
conflict.
BDR could choose to make each INSERT into a check-for-recently
deleted, as occurs with an update_missing
conflict. However, the
cost of doing this penalizes the majority of users, so at this time
it simply logs delete_missing
.
Later releases will automatically resolve INSERT
/DELETE
anomalies
via rechecks using LiveCompare when delete_missing
conflicts occur.
These can be performed manually by applications by checking
the bdr.conflict_history_summary
view.
These conflicts can occur in two main problem use cases:
INSERT
followed rapidly by aDELETE
, as can be used in queuing applications- Any case where the primary key identifier of a table is reused
Neither of these cases is common. We recommend not replicating the affected tables if these problem use cases occur.
BDR has problems with the latter case because BDR relies on the uniqueness of identifiers to make replication work correctly.
Applications that insert, delete, and then later reuse the same unique identifiers can cause difficulties. This is known as the ABA problem. BDR has no way of knowing whether the rows are the current row, the last row, or much older rows.
Unique identifier reuse is also a business problem, since it is prevents unique identification over time, which prevents auditing, traceability, and sensible data quality. Applications don't need to reuse unique identifiers.
Any identifier reuse that occurs in the time interval it takes for changes to pass across the system causes difficulties. Although that time might be short in normal operation, down nodes can extend that interval to hours or days.
We recommend that applications don't reuse unique identifiers, but if they do, take steps to avoid reuse within a period of less than a year.
This problem doesn't occur in applications that use sequences or UUIDs.
Foreign key constraint conflicts
Conflicts between a remote transaction being applied and existing local data
can also occur for FOREIGN KEY
(FK) constraints.
BDR applies changes with session_replication_role = 'replica'
, so foreign
keys aren't rechecked when applying changes.
In an active/active environment, this can result in FK violations if deletes
occur to the referenced table at the same time as inserts into the referencing
table. This is similar to an INSERT
/DELETE
conflict.
In single-master Postgres, any INSERT
/UPDATE
that refers to a value in the
referenced table must wait for DELETE
operations to finish before they can gain
a row-level lock. If a DELETE
removes a referenced value, then the INSERT
/UPDATE
fails the FK check.
In multi-master BDR. there are no inter-node row-level locks. An INSERT
on
the referencing table doesn't wait behind a DELETE
on the referenced table,
so both actions can occur concurrently. Thus an INSERT
/UPDATE
on one node
on the referencing table can use a value at the same time as a DELETE
on the referenced table on another node. This then results in a value
in the referencing table that's no longer present in the referenced
table.
In practice, this occurs if the DELETE
operations occurs on referenced tables
in separate transactions from DELETE
operations on referencing tables. This isn't
a common operation.
In a parent-child relationship such as Orders -> OrderItems, it isn't typical to do this. It's more likely to mark an OrderItem as canceled than to remove it completely. For reference/lookup data, it's unusual to completely remove entries at the same time as using those same values for new fact data.
While there's a possibility of dangling FKs, the risk of this in general is very low and so BDR doesn't impose a generic solution to cover this case. Once you understand the situation in which this occurs, two solutions are possible.
The first solution is to restrict the use of FKs to closely related entities that are generally modified from only one node at a time, are infrequently modified, or where the modification's concurrency is application-mediated. This avoids any FK violations at the application level.
The second solution is to add triggers to protect against this case using
the BDR-provided functions bdr.ri_fkey_trigger()
and
bdr.ri_fkey_on_del_trigger()
. When called as BEFORE
triggers, these
functions use FOREIGN KEY
information to avoid FK anomalies by
setting referencing columns to NULL, much as if you had a SET NULL constraint.
This rechecks all FKs in one trigger, so you need to add only one
trigger per table to prevent FK violation.
As an example, suppose you have two tables: Fact and RefData. Fact has an FK that references RefData. Fact is the referencing table and RefData is the referenced table. You need to add one trigger to each table.
Add a trigger that sets columns to NULL in Fact if the referenced row in RefData was already deleted.
Add a trigger that sets columns to NULL in Fact at the time a DELETE occurs on the RefData table.
Adding both triggers avoids dangling foreign keys.
TRUNCATE conflicts
TRUNCATE
behaves similarly to a DELETE
of all rows but performs this
action by physically removing the table data rather than row-by-row
deletion. As a result, row-level conflict handling isn't available, so
TRUNCATE
commands don't generate conflicts with other DML actions,
even when there's a clear conflict.
As a result, the ordering of replay can cause divergent changes if
another DML is executed concurrently on other nodes to the TRUNCATE
.
You can take one of the following actions:
Ensure
TRUNCATE
isn't executed alongside other concurrent DML. Rely on LiveCompare to highlight any such inconsistency.Replace
TRUNCATE
with aDELETE
statement with noWHERE
clause. This approach is likely to have very poor performance on larger tables.Set
bdr.truncate_locking = 'on'
to set theTRUNCATE
command’s locking behavior. This setting determines whetherTRUNCATE
obeys thebdr.ddl_locking
setting. This isn't the default behavior forTRUNCATE
since it requires all nodes to be up. This configuration might not be possible or wanted in all cases.
Exclusion constraint conflicts
BDR doesn't support exclusion constraints and prevents their creation.
If an existing standalone database is converted to a BDR database, then drop all exclusion constraints manually.
In a distributed asynchronous system, you can't ensure that no set of rows that violate the constraint exists, because all transactions on different nodes are fully isolated. Exclusion constraints lead to replay deadlocks where replay can't progress from any node to any other node because of exclusion constraint violations.
If you force BDR to create an exclusion constraint, or you don't drop existing ones when converting a standalone database to BDR, expect replication to break. To get it to progress again, remove or alter the local tuples that an incoming remote tuple conflicts with so that the remote transaction can be applied.
Data conflicts for roles and tablespace differences
Conflicts can also arise where nodes have global (Postgres-system-wide)
data, like roles, that differ. This can result in operations—mainly
DDL