Durability and performance options v5
Overview
EDB Postgres Distributed allows you to choose from several replication configurations based on your durability, consistency, availability, and performance needs using Commit Scopes.
In it's basic configuration, EDB Postgres Distributed will use asynchronous
replication, however commit scopes can change both the default and the per
transaction behavior. It's also possible to configure the legacy Postgres
synchronous replication using standard synchronous_standby_names
in a same
way as the built-in physical or logical replication. Commit scopes however
provide much more flexibility and control over the replication behavior.
The different synchronization settings affect three properties of interest to applications that are related but can all be implemented individually:
- Durability: Writing to multiple nodes increases crash resilience and allows you to recover the data after a crash and restart.
- Visibility: With the commit confirmation to the client, the database guarantees immediate visibility of the committed transaction on some sets of nodes.
- Conflict handling: Conflicts can be handled either optimistically post-commit, with conflicts being resolved when the transaction is replicated based on commit timestamps. Or they can be handled pessimistically pre-commit, where the client can rely on the transaction to eventually be applied on all nodes without further conflicts or get an abort directly informing the client of an error.
Commit Scopes allow two ways of controlling durability of the transaction:
- Group Commit. This option controls which and how many nodes have to reach a consensus before we consider transaction to be committable and at what stage of replication we can consider it committed. This also allows controlling the visibility ordering of the transaction.
- CAMO. Variant of Group Commit where the client is part of the consensus.
- Lag Control. This option controls how far behind can nodes be in terms of replication before allowing commit to proceed.
Postgres provides Physical Streaming Replication
(PSR), which is unidirectional but offers a synchronous variant.
For backward compatibility, BDR still supports configuring synchronous
replication with synchronous_commit
and synchronous_standby_names
. See
Legacy synchronous replication,
but consider using Group Commit instead.
Terms and definitions
BDR nodes take different roles during the replication of a transaction. These are implicitly assigned per transaction and are unrelated even for concurrent transactions.
The origin is the node that receives the transaction from the client or application. It's the node processing the transaction first, initiating replication to other BDR nodes, and responding back to the client with a confirmation or an error.
A partner node is a BDR node expected to confirm transactions either according to Group Commit requirements.
A commit group is the group of all BDR nodes involved in the commit, that is, the origin and all of its partner nodes, which can be just a few or all peer nodes.
Comparison
Most options for synchronous replication available to BDR allow for different levels of synchronization, offering different tradeoffs between performance and protection against node or network outages.
The following table summarizes what a client can expect from a peer
node replicated to after having received a COMMIT confirmation from
the origin node the transaction was issued to. The Mode column takes
on different meaning depending on the variant. For PSR and legacy
synchronous replication with BDR, it refers to the
synchronous_commit
setting. And for Commit Scopes, it refers to the
confirmation requirements of the
commit scope configuration.
Variant | Mode | Received | Visible | Durable |
---|---|---|---|---|
PSR Async | off (default) | no | no | no |
PGD Async | off (default) | no | no | no |
PGD Lag Control | 'ON received' nodes | no | no | no |
PGD Lag Control | 'ON replicated' nodes | no | no | no |
PGD Lag Control | 'ON durable' nodes | no | no | no |
PGD Lag Control | 'ON visible' nodes | no | no | no |
PSR Sync | remote_write (2) | yes | no | no (1) |
PSR Sync | on (2) | yes | no | yes |
PSR Sync | remote_apply (2) | yes | yes | yes |
PGD Group Commit | 'ON received' nodes | yes | no | no |
PGD Group Commit | 'ON replicated' nodes | yes | no | no |
PGD Group Commit | 'ON durable' nodes | yes | no | yes |
PGD Group Commit | 'ON visible' nodes | yes | yes | yes |
PGD CAMO | 'ON received' nodes | yes | no | no |
PGD CAMO | 'ON replicated' nodes | yes | no | no |
PGD CAMO | 'ON durable' nodes | yes | no | yes |
PGD CAMO | 'ON visible' nodes | yes | yes | yes |
PGD Legacy Sync (3) | remote_write (2) | yes | no | no |
PGD Legacy Sync (3) | on (2) | yes | yes | yes |
PGD Legacy Sync (3) | remote_apply (2) | yes | yes | yes |
(1) Written to the OS, durable if the OS remains running and only Postgres crashes.
(2) Unless switched to local mode (if allowed) by setting
synchronous_replication_availability
to async'
, otherwise the
values for the asynchronous BDR default apply.
(3) Consider using Group Commit instead.
Reception ensures the peer operating normally can eventually apply the transaction without requiring any further communication, even in the face of a full or partial network outage. A crash of a peer node might still require retransmission of the transaction, as this confirmation doesn't involve persistent storage. All modes considered synchronous provide this protection.
Visibility implies the transaction was applied remotely. All other clients see the results of the transaction on all nodes, providing this guarantee immediately after the commit is confirmed by the origin node. Without visibility, other clients connected might not see the results of the transaction and experience stale reads.
Durability relates to the peer node's storage and provides protection against loss of data after a crash and recovery of the peer node. This can either relate to the reception of the data (as with physical streaming replication) or to visibility (as with Group Commit). The former eliminates the need for retransmissions after a crash, while the latter ensures visibility is maintained across restarts.
Internal timing of operations
For a better understanding of how the different modes work, it's helpful to realize PSR and BDR apply transactions differently.
With physical streaming replication, the order of operations is:
- Origin flushes a commit record to WAL, making the transaction visible locally.
- Peer node receives changes and issues a write.
- Peer flushes the received changes to disk.
- Peer applies changes, making the transaction visible locally.
With PGD, the order of operations is different:
- Origin flushes a commit record to WAL, making the transaction visible locally.
- Peer node receives changes into its apply queue in memory.
- Peer applies changes, making the transaction visible locally.
- Peer persists the transaction by flushing to disk.
For Group Commit, CAMO, and Eager, the origin node waits for a certain number of confirmations prior to making the transaction visible locally. The order of operations is:
- Origin flushes a prepare or precommit record to WAL.
- Peer node receives changes into its apply queue in memory.
- Peer applies changes, making the transaction visible locally.
- Peer persists the transaction by flushing to disk.
- Origin commits and makes the transaction visible locally.
The following table summarizes the differences.
Variant | Order of apply vs persist | Replication before or after commit |
---|---|---|
PSR | persist first | after WAL flush of commit record |
PGD Async | apply first | after WAL flush of commit record |
PGD Lag Control | apply first | after WAL flush of commit record |
PGD Group Commit | apply first | before COMMIT on origin |
PGD CAMO | apply first | before COMMIT on origin |
Configuration
Configuring Commit Scopes, is done through SQL function just like other administration operations in PGD.
For example you might define basic Commit Scope which does Group Commit on majority of nodes in the example_group BDR group:
You can then use the commit scope either by setting configuration variable (GUC)
bdr.commit_scope
either per transaction or globally to that commit scope.
You can also set the default commit scope for given BDR group.
Note that the default_commit_scope
is checked in the group tree the given origin
node belongs to from bottom to up. The default_commit_scope
cannot be set to
the special value local
which means no commit scope this way, for that use
the bdr.commit_scope
configuration parameter.
Full details of the Commit Scope language with all the options are described in the Commit Scopes chapter.
Postgres configuration parameters
The following table provides an overview of the configuration settings that are required to be set to a non-default value (req) or optional (opt) but affecting a specific variant.
Setting (GUC) | Group Commit | Lag Control | PSR (1) | Legacy Sync |
---|---|---|---|---|
synchronous_standby_names | n/a | n/a | req | req |
synchronous_commit | n/a | n/a | opt | opt |
synchronous_replication_availability | n/a | n/a | opt | opt |
bdr.commit_scope | opt | opt | n/a | n/a |
Planned shutdown and restarts
When using Group Commit with receive confirmations, take care
with planned shutdown or restart. By default, the apply queue is consumed
prior to shutting down. However, in the immediate
shutdown mode, the queue
is discarded at shutdown, leading to the stopped node "forgetting"
transactions in the queue. A concurrent failure of the origin node can
lead to loss of data, as if both nodes failed.
To ensure the apply queue gets flushed to disk, use either
smart
or fast
shutdown for maintenance tasks. This approach maintains the
required synchronization level and prevents loss of data.
Legacy synchronous replication using BDR
Note
Consider using Group Commit instead.
Usage
To enable synchronous replication using BDR, you need to add the application
name of the relevant BDR peer nodes to
synchronous_standby_names
. The use of FIRST x
or ANY x
offers a
some flexibility if this doesn't conflict with the requirements of
non-BDR standby nodes.
Once you've added it, you can configure the level of synchronization per
transaction using synchronous_commit
, which defaults to on
. This setting means that
adding to synchronous_standby_names
already enables synchronous
replication. Setting synchronous_commit
to local
or off
turns
off synchronous replication.
Due to BDR applying the transaction before persisting it, the
values on
and remote_apply
are equivalent (for logical
replication).
Migration to Commit Scopes
The Group Commit feature of BDR is configured independent of
synchronous_commit
and synchronous_standby_names
. Instead, the
bdr.commit_scope
GUC allows you to select the scope per transaction. And
instead of synchronous_standby_names
configured on each node
individually, Group Commit uses globally synchronized Commit Scopes.
Note
While the grammar for synchronous_standby_names
and Commit
Scopes looks similar, the former
doesn't account for the origin node, but the latter does.
Therefore, for example, synchronous_standby_names = 'ANY 1 (..)'
is equivalent to a Commit Scope of ANY 2 (...)
. This choice
makes reasoning about majority easier and reflects that the origin
node also contributes to the durability and visibility of the
transaction.