Lag control v5

Data throughput of database applications on a BDR origin node can exceed the rate at which committed data can be safely replicated to downstream peer nodes. If this disparity persists beyond a period of time or chronically in high availability applications, then organizational objectives related to disaster recovery or business continuity plans might not be satisfied.

The replication lag control (RLC) feature is designed to regulate this imbalance using a dynamic rate-limiting device so that data flow between BDR group nodes complies with these organizational objectives. It does so by controlling the extent of replication lag between BDR nodes.

Some of these objectives include the following:

  • Recovery point objective (RPO) specifies the maximum tolerated amount of data that can be lost due to unplanned events, usually expressed as an amount of time. In non-replicated systems, RPO is used to set backup intervals to limit the risk of lost committed data due to failure. For replicated systems, RPO determines the acceptable amount of committed data that hasn't been safely applied to one or more peer nodes.

  • Resource constraint objective (RCO) acknowledges that there are finite storage constraints. This storage includes database files, WAL, and temporary or intermediate files needed for continued operation. For replicated systems, as lag increases the demands on these storage resources also increase.

  • Group elasticity objective (GEO) ensures that any node isn't originating new data at a clip that can't be acceptably saved to its peer nodes. When that is the case then the detection of that condition can be used as one metric in the decision to expand the number of database nodes. Similarly, when that condition abates then it might influence the decision to contract the number of database nodes.

Lag control manages replication lag by controlling the rate at which client connections can commit READ WRITE transactions. Replication lag is measured either as lag time or lag size, depending on the objectives to meet. Transaction commit rate is regulated using a configured BDR commit-delay time.

Requirements

To get started using lag control:

  • Determine the maximum acceptable commit delay time max_commit_delay that can be tolerated for all database applications.

  • Decide on the lag measure to use. Choose either lag size max_lag_size or lag time max_lag_time.

  • Decide on the groups or subgroups involved and the minimum number of nodes in each collection required to satisfy confirmation. This will form the basis for the definition of a commit scope rule.

Configuration

Lag control is specified within a commit scope, which allows consistent and coordinated parameter settings across the nodes spanned by the commmit scope rule. A Lag control specification can be included in the default commit scope of a top group or part of an Origin group commit scope.

Using the sample node groups from the Commit Scope chapter, this example shows lag control rules for two datacenters.

-- create a Lag Control commit scope with individual rules
-- for each sub-group
SELECT bdr.add_commit_scope(
    commit_scope_name := 'example_scope',
    origin_node_group := 'left_dc',
    rule := 'ALL (left_dc) LAG CONTROL (max_commit_delay=500ms, max_lag_time=30s) AND ANY 1 (right_dc) LAG CONTROL (max_commit_delay=500ms, max_lag_time=30s)',
    wait_for_ready := true
);
SELECT bdr.add_commit_scope(
    commit_scope_name := 'example_scope',
    origin_node_group := 'right_dc',
    rule := 'ANY 1 (left_dc) LAG CONTROL (max_commit_delay=0.250ms, max_lag_size=100MB) AND ALL (right_dc) LAG CONTROL (max_commit_delay=0.250ms, max_lag_size=100MB)',
    wait_for_ready := true
);

Note the parameter values admit unit specification that is compatible with GUC parameter conventions.

A Lag control commit scope rule can be added to existings commit scope rules that also include Group Commit and CAMO rule specifications.

max_commit_delay parameter permits and encourages a specification of milliseconds with a fractional part, including a sub-millisecond setting if appropriate.

Overview

Lag control is a dynamic TPS rate-limiting mechanism that operates at the client connection level. It's designed to be as unobtrusive as possible while satisfying configured lag-control constraints. This means that if enough BDR nodes can replicate changes fast enough to remain below configured lag measure thresholds, then the BDR runtime commit delay stays fixed at 0 milliseconds.

If this isn't the case, minimally adjust the BDR runtime commit delay as high as needed, but no higher, until the number of conforming nodes returns to the minimum threshold.

Even after the minimum node threshold is reached, lag control continues to attempt to drive the BDR runtime commit delay back to zero. The BDR commit delay might rise and fall around an equilibrium level most of the time, but if data throughput or lag-apply rates improve then the commit delay decreases over time.

The BDR commit delay is a post-commit delay. It occurs after the transaction has committed and after all Postgres resources locked or acquired by the transaction are released. Therefore, the delay doesn't prevent concurrent active transactions from observing or modifying its values or acquiring its resources. The same guarantee can't be made for external resources managed by Postgres extensions. Regardless of extension dependencies, the same guarantee can be made if the BDR extension is listed before extension-based resource managers in postgresql.conf.

Strictly speaking, the BDR commit delay is not a per-transaction delay. It is the mean value of commit delays over a stream of transactions for a particular client connection. This technique allows the commit delay and fine-grained adjustments of the value to escape the coarse granularity of OS schedulers, clock interrupts, and variation due to system load. It also allows the BDR runtime commit delay to settle within microseconds of the lowest duration possible to maintain a lag measure threshold.

Note

Don't conflate the BDR commit delay with the Postgres commit delay. They are unrelated and perform different functions. Don't substitute one for the other.

Transaction application

The BDR commit delay is applied to all READ WRITE transactions that modify data for user applications. This implies that any transaction that doesn't modify data, including declared READ WRITE transactions, is exempt from the commit delay.

Asynchronous transaction commit also executes a BDR commit delay. This might appear counterintuitive, but asynchronous commit, by virtue of its performance, can be one of the greatest sources of replication lag.

Postgres and BDR auxillary processes don't delay at transaction commit. Most notably, BDR writers don't execute a commit delay when applying remote transactions on the local node. This is by design as BDR writers contribute nothing to outgoing replication lag and can reduce incoming replication lag the most by not having their transaction commits throttled by a delay.

Limitations

The maximum commit delay is a ceiling value representing a hard limit. This means that a commit delay never exceeds the configured value. Conversely, the maximum lag measures both by size and time and are soft limits that can be exceeded. When the maximum commit delay is reached, there's no additional back pressure on the lag measures to prevent their continued increase.

There's no way to exempt origin transactions that don't modify BDR replication sets from the commit delay. For these transactions, it can be useful to SET LOCAL the maximum transaction delay to 0.

Caveats

Application TPS is one of many factors that can affect replication lag. Other factors include the average size of transactions for which BDR commit delay can be less effective. In particular, bulk load operations can cause replication lag to rise, which can trigger a concomitant rise in the BDR runtime commit delay beyond the level reasonably expected by normal applications, although still under the maximum allowed delay.

Similarly, an application with a very high OLTP requirement and modest data changes can be unduly restrained by the acceptable BDR commit delay setting.

In these cases, it can be useful to use the SET [SESSION|LOCAL] command to custom configure lag control settings for those applications or modify those applications. For example, bulk load operations are sometimes split into multiple, smaller transactions to limit transaction snapshot duration and WAL retention size or establish a restart point if the bulk load fails. In deference to lag control, those transaction commits can also schedule very long BDR commit delays to allow digestion of the lag contributed by the prior partial bulk load.

Meeting organizational objectives

In the example objectives list earlier:

  • RPO can be met by setting an appropriate maximum lag time.
  • RCO can be met by setting an appropriate maximum lag size.
  • GEO can be met by monitoring the BDR runtime commit delay and the BDR runtime lag measures,

As mentioned, when the maximum BDR runtime commit delay is pegged at the BDR configured commit-delay limit and the lag measures consistently exceed their BDR-configured maximum levels, this scenario can be a marker for BDR group expansion.

export const _frontmatter = {"title":"Lag control","redirects":["/pgd/latest/bdr/lag-control/"]}