Backing up analytical data
Scouter Analytics uses Ducklake to store events. Internally Ducklake manages a directory of parquet files that can be backed up with any filesystem-level backup tool, and catalogs them with a traditional relational database.
Catalog backups
Ducklake supports a few databases as the lakehouse catalog; however, Scouter Analytics is configured to use SQLite, thus any SQLite backup tool will work.
Consult the operations backup manual on configuring litestream.
The default location for the catalog is
/var/lib/scouter/analytics/lakehouse/catalog.db
Data backups
Events are stored as industry standard parquet files, and can be backed up with any filesystem-level backup tool. One such option is rclone.
# /etc/systemd/system/scouter-analytics-backups-lakehouse-data.service
[Unit]
Descripton=Scouter Analytics Instance Lakehouse Data Backup
[Service]
Restart=on-failure
DynamicUser=yes
ProtectSystem=full
NoNewPrivileges=true
PrivateDevices=true
StateDirectory=scouter/analytics
Type=oneshot
Environment=AWS_ACCESS_KEY_ID=...
Environment=AWS_SECRET_KEY=...
Environment=BUCKET=...
ExecStart=/usr/bin/rclone \
copy \
--s3-env-auth=true \
--s3-endpoint=... \
--s3-provider=aws \
${STATE_DIRECTORY}/lakehouse/data/ \
:s3:${BUCKET}/lakehouse/data/
To trigger the service to run, use a systemd.timer unit
# /etc/systemd/system/scouter-analytics-backups-lakehouse-data.timer
[Unit]
Description=Scouter Analytics Lakehouse Data Backup Timer
[Timer]
OnActiveSec=0s
OnUnitActiveSec=24h