Property Background Repair Observability

Based on the Property Background Repair Strategy documentation, this article explains how to visualize and monitor each synchronization cycle to enhance observability and debugging.

This feature is enabled by default. You can configure whether to record the data through the --property-repair-obs-enabled option.

Tracing

Tracing is used to record the operation flow at each node during gossip-based Property repair. This allows for fast and accurate diagnosis in the event of issues or inconsistencies during the repair process.

All trace data is written to the _property_repair_spans stream group and can be queried for inspection and analysis.

Tag Name	Type	Description
`tracing_id`	`string`	Unique trace ID for the entire repair task, typically `sender_node_id + start_time`.
`span_id`	`int`	Unique span ID within the trace.
`parent_span_id`	`int`	ID of the parent span (0 if root span).
`sender_node_id`	`string`	ID of the sender node where the repair was executed.
`current_node_id`	`string`	ID of the node where the span was executed.
`start_time`	`int64`	Unix timestamp of span start (in milliseconds).
`end_time`	`int64`	Unix timestamp of span end (in milliseconds).
`duration`	`int64`	Total duration of the span, in milliseconds.
`message`	`string`	Descriptive log of the action performed in this span.
`tags_json`	`string`	JSON-formatted key-value tags attached to the span (e.g. group, shard_id).
`is_error`	`bool`	Whether this span encountered an error.
`error_reason`	`string`	Error message or failure reason if `is_error` is true.
`sequence_number`	`int`	Which round number of gossip propagation.

All Property-related metadata is encapsulated within the tags_json field of each trace span. This allows for flexible and structured logging of contextual information. The following are key fields commonly included in tags_json:

Tag Name	Description
`target_node`	The ID of the target node involved in the current operation.
`group_name`	The name of the Property group being synchronized.
`shard_id`	The shard identifier being processed.
`operate_type`	The type of operation being performed, such as `"send_summary"`, `"compare_leaf"`, or `"update_property"`.
`property_id`	The identifier of the specific Property being updated or compared.

Data TTL

By default, the system automatically retains all background repair records for three days. This retention period can be configured via the --property-repair-history-days option.

Metrics

All metrics are reported through the internal self-observation system and can be queried using standard tools as described in the Observability documentation.

In the context of Property background repair, the following key metrics are exposed:

Metric Name	Type	Description
`property_repair_success_count`	Counter	Total number of Properties successfully repaired across all nodes.
`property_repair_failure_count`	Counter	Total number of Properties that failed to repair due to validation, write errors, or version conflicts.
`property_repair_gossip_abort_count`	Counter	Total number of gossip repair sessions that were forcefully aborted due to unrecoverable errors or unavailable peers.
`property_repair_finished_count`	Counter	Total count of the repair process was triggered (either scheduled or event-based).
`property_repair_finished_latency`	Counter	Total latency of the whole property repair process.
`property_repair_per_node_sync_finished`	Counter	The number of completed synchronization operations executed by each node during Property repair.
`property_repair_per_node_sync_latency`	Counter	The latency seconds of completed synchronization operations executed by each node during Property repair.
`property_repair_total_propagation_count`	Counter	Total success propagation count across all nodes.
`property_repair_total_propagation_percent`	Histogram	Total percent have propagation in each round. `total_propagation_count` / `max_propagation_count`.

Edit this page

BanyanDB

Property Background Repair Observability

Tracing

Data TTL

Metrics

Release Apache SkyWalking APM 10.3.0