Backup Data Using BanyanDB Backup Tool
The backup command-line tool is used to backup BanyanDB snapshots to remote storage. The tool allows both one-time and scheduled backups.
Overview
The backup tool performs the following operations:
- Connects to a BanyanDB data node via gRPC to request the latest snapshots.
- Determines the local snapshot directories based on the catalog type:
- Stream Catalog: Uses the
stream-root-path
. - Measure Catalog: Uses the
measure-root-path
. - Property Catalog: Uses the
property-root-path
.
- Stream Catalog: Uses the
- Computes a time-based directory name (formatted as daily or hourly).
- Uploads files that are not found in the remote storage to the specified destination.
- Deletes orphaned files in the remote destination that no longer exist locally.
- Optionally schedules periodic backups using cron-style expressions.
Prerequisites
Before running the backup tool, ensure you have:
- A running BanyanDB data node exposing a gRPC service (by default at
127.0.0.1:17912
). - Network connectivity to the data node.
- A valid destination URL specified using the
file:///
scheme (or any supported scheme in future releases).- Local filesystem:
file:///
scheme - AWS S3: S3-compatible URLs with appropriate credentials
- Azure Blob Storage: Azure URLs with account credentials
- Google Cloud Storage: GCS URLs with service account
- Local filesystem:
- Necessary access rights for writing to the destination.
- Sufficient permissions to access the snapshots directories for Stream, Measure, and Property catalogs on the data node.
Command-Line Usage
One-Time Backup
To perform a one-time backup, run the backup command with the required flags. At a minimum, you must specify the destination URL.
Example Command:
./backup --dest "file:///backups"
Scheduled Backup
To enable periodic backups, provide the --schedule
flag with a schedule style (e.g., @yearly, @monthly, @weekly, @daily, @hourly or @every --time-style
. The supported schedule expressions based on the tool’s internal map are:
- Hourly: Schedules at “5 * * * *” (runs on the 5th minute of every hour).
- Daily: Schedules at “5 0 * * *” (runs at 00:05 every day).
Example Command:
./backup --dest "file:///backups" --schedule @daily --time-style daily
Cloud Storage Examples
Quickly back up to common cloud object stores using the same syntax:
# AWS S3
# Using credential file
./banyand-backup --dest "s3://my-bucket/backups" \
--s3-credential-file "/path/to/credentials" \
--s3-profile "my-profile"
# Using config file
./banyand-backup --dest "s3://my-bucket/backups" \
--s3-config-file "/path/to/config" \
--s3-storage-class "GLACIER" \
--s3-checksum-algorithm "SHA256"
# Google Cloud Storage
./banyand-backup --dest "gs://my-bucket/backups" \
--gcp-service-account-file "/path/to/service-account.json"
# Azure Blob Storage
# Using account key
./banyand-backup --dest "azure://mycontainer/backups" \
--azure-account-name "myaccount" \
--azure-account-key "mykey" \
--azure-endpoint "https://myaccount.blob.core.windows.net"
# Using SAS token
./banyand-backup --dest "azure://mycontainer/backups" \
--azure-account-name "myaccount" \
--azure-sas-token "mysastoken"
When a schedule is provided, the tool:
- Registers a cron job using an internal scheduler.
- Runs the backup action periodically according to the scheduled expression.
- Waits for termination signals (
SIGINT
orSIGTERM
) to gracefully shut down.
Detailed Options
Flag | Description | Default Value |
---|---|---|
--grpc-addr |
gRPC address of the data node. | 127.0.0.1:17912 |
--enable-tls |
Enable TLS for the gRPC connection. | false |
--insecure |
Skip server certificate verification. | false |
--cert |
Path to the gRPC server certificate. | empty |
--dest |
Destination URL for backup data (file:/// , s3:// , azure:// , gs:// ) |
required |
--stream-root-path |
Root directory for the stream catalog snapshots. | /tmp |
--measure-root-path |
Root directory for the measure catalog snapshots. | /tmp |
--property-root-path |
Root directory for the property catalog snapshots. | /tmp |
--time-style |
Directory naming style based on time (daily or hourly ) |
daily |
--schedule |
Schedule expression for periodic backup. Options: @yearly , @monthly , @weekly , @daily , @hourly , @every <duration> |
empty |
--logging-level |
Root logging level (debug , info , warn , error ) |
info |
--logging-env |
Logging environment (dev or prod ) |
prod |
AWS S3 specific | ||
--s3-profile |
AWS profile name | empty |
--s3-config-file |
Path to the AWS config file | empty |
--s3-credential-file |
Path to the AWS credential file | empty |
--s3-storage-class |
AWS S3 storage class for uploaded objects | empty |
--s3-checksum-algorithm |
Checksum algorithm when uploading to S3 | empty |
Azure Blob specific | ||
--azure-account-name |
Azure storage account name | empty |
--azure-account-key |
Azure storage account key | empty |
--azure-sas-token |
Azure SAS token (alternative to account key) | empty |
--azure-endpoint |
Azure blob service endpoint | empty |
Google Cloud Storage specific | ||
--gcp-service-account-file |
Path to the GCP service account JSON file | empty |
This guide should provide you with the necessary steps and information to effectively use the backup tool for your data backup operations.