Backing up analytical data

Scouter Analytics uses Ducklake to store events. Internally Ducklake manages a directory of parquet files that can be backed up with any filesystem-level backup tool, and catalogs them with a traditional relational database.

Catalog backups

Ducklake supports a few databases as the lakehouse catalog; however, Scouter Analytics is configured to use SQLite, thus any SQLite backup tool will work.

Consult the operations backup manual on configuring litestream.

The default location for the catalog is /var/lib/scouter/analytics/lakehouse/catalog.db

Data backups

Events are stored as industry standard parquet files, and can be backed up with any filesystem-level backup tool. One such option is rclone.

# /etc/systemd/system/scouter-analytics-backups-lakehouse-data.service
[Unit]
Descripton=Scouter Analytics Instance Lakehouse Data Backup
[Service]
Restart=on-failure
DynamicUser=yes
ProtectSystem=full
NoNewPrivileges=true
PrivateDevices=true
StateDirectory=scouter/analytics
Type=oneshot
Environment=AWS_ACCESS_KEY_ID=...
Environment=AWS_SECRET_KEY=...
Environment=BUCKET=...
ExecStart=/usr/bin/rclone \
copy \
--s3-env-auth=true \
--s3-endpoint=... \
--s3-provider=aws \
${STATE_DIRECTORY}/lakehouse/data/ \
:s3:${BUCKET}/lakehouse/data/

To trigger the service to run, use a systemd.timer unit

# /etc/systemd/system/scouter-analytics-backups-lakehouse-data.timer
[Unit]
Description=Scouter Analytics Lakehouse Data Backup Timer
[Timer]
OnActiveSec=0s
OnUnitActiveSec=24h