OpenTelemetry Integration v5

EDB Postgres Distributed can be configured to report monitoring information as well as traces to the OpenTelemetry collector.

Several resource attributes are filled by EDB Postgres Distributed OTEL collector. These are attached to all metrics and traces:

  • The service.name is configurable via bdr.otel_service_name configuration setting.
  • The service.namespace is always set to edb_postgres_distributed.
  • The service.instance.id is always set to system identifier of the Postgres instance.
  • The service.version is set to current version of the BDR extension loaded in the Postgresql instance.

Metrics collection

The metric collection is enable automatically when configuration option bdr.metrics_otel_http_url is set to non-empty URL.

Different kinds of metrics are being collected as seen bellow.

Generic metrics

Metric nameTypeLabelsDescription
pg_backends_by_stategaugeconn_state - idle, active, idle in transaction, fastpath functioncall, idle in transaction (aborted), disabled, undefinedNumber of backends in a given state
pg_oldest_xact_startgaugeOldest transaction start time
pg_oldest_activity_startgaugeOldest query start time
pg_waiting_backendsgaugewait_type - LWLock, Lock, BufferPin, Activity, Client, Extension, IPC, Timeout, IO, ??? (for unknown)Number of currently waiting backends by wait type
pg_start_timegaugeTimestamp at which the server has started
pg_reload_timegaugeTimestamp at which the server has last reloaded configuration

Replication metrics

Metric nameTypeLabelsDescription
bdr_slot_sent_laggaugeslot_name - name of a slotCurrent sent lag in bytes for each replication slot
bdr_slot_write_laggaugeslot_name - name of a slotCurrent write lag in bytes for each replication slot
bdr_slot_flush_laggaugeslot_name - name of a slotCurrent flush lag in bytes for each replication slot
bdr_slot_apply_laggaugeslot_name - name of a slotCurrent apply lag in bytes for each replication slot
bdr_subscription_receive_lsngaugesub_name - name of subscriptionCurrent received LSN for each subscription
bdr_subscription_flush_lsngaugesub_name - name of subscriptionCurrent flushed LSN for each subscription
bdr_subscription_apply_lsngaugesub_name - name of subscriptionCurrent applied LSN for each subscription
bdr_subscription_receivergaugesub_name - name of subscriptionWhether subscription receiver is currently running (1) or not (0)

Consensus metric

Metric nameTypeLabelsDescription
bdr_raft_stategaugestate_str - RAFT_FOLLOWER, RAFT_CANDIDATE, RAFT_LEADER, RAFT_STOPPEDRaft state of the consensus on this node
bdr_raft_protocol_versiongaugeConsensus protocol version used by this node
bdr_raft_leader_nodegaugeId of a node that this node considers to be current leader
bdr_raft_nodesgaugeTotal number of nodes that participate in consensus (includes learner/non-voting nodes)
bdr_raft_voting_nodesgaugeNumber of actual voting nodes in consensus
bdr_raft_termgaugeCurrent raft term this node is on
bdr_raft_commit_indexgaugeRaft commit index committed by this node
bdr_raft_apply_indexgaugeRaft commit index applied by this node

Tracing

Tracing collection to OpenTelemetry requires bdr.trace_otel_http_url to be configured and tracing itself to be enabled using bdr.trace_enable.

The tracing is limited to only some subsystems at the moment, primarily to the cluster management functionality. The following spans can be seen in traces:

Span nameDescription
create_node_groupGroup creation
alter_node_group_configChange of group config option(s)
alter_node_configChange of node config option
join_node_groupNode joining a group
join_send_remote_requestJoin source sending the join request on behalf of the joining node
add_camo_pairAdd CAMO pair
alter_camo_pairChange CAMO pair
remove_camo_pairDelete CAMO pair
alter_commit_scopeChange commit scope definition (either create new or update existing)
alter_proxy_configChange config for PGD-Proxy instance (either create new or update existing)
walmsg_global_lock_sendSend global locking WAL message
walmsg_global_lock_recvReceived global locking WAL message
ddl_epoch_applyGlobal locking epoch apply (ensure cluster is synchronized enough for new epoch to start)
walmsg_catchupCatchup during node removal WAL message
raft_send_appendentriesInternal Raft book keeping message
raft_recv_appendentriesInternal Raft book keeping message
raft_requestRaft request execution
raft_queryRaft query execution
msgb_sendConsensus messaging layer message
msgb_recv_receiveConsensus messaging layer message
msgb_recv_deliverConsensus messaging layer message delivery
preprocess_ddlDDL command preprocessing

TLS support

The metrics and tracing endpoints can be either HTTP or HTTPS. It's possible to configure paths to the CA bundle, client key, and client certificate using bdr.otel_https_ca_path, bdr.otel_https_key_path, and bdr.otel_https_cert_path configuration options.