Today's topic: Backups
I'll start with a friendly reminder:
DO YOUR FRICKIN BACKUPS!
No, I didn't lose any data. But I realized that my new databases don't have any backup strategy at the moment and that made me very nervous.
As it should make you nervous too if you are lacking backups.
So, I'll share what I did and how.
Prepare a Backup Target
I use a Hetzner storage box. It has enough space (5 TB) for all of my side project backups, as well as my personal backups from my local NAS.
I connect to it via ssh, more specifically mounted as filesystem via sshfs.
Costs: € 11 per month.
Main database: CrateDB
CrateDB has a very well integrated backup solution. It has its own snapshots API which you just trigger via:
CREATE SNAPSHOT "goodwatch-backup"."{snapshot_name}" ALL WITH (wait_for_completion=true)
I mounted the snapshots folder to my Hetzner storage via docker compose.
The DB has 50 millions rows, is 60 GB worth of data and it took five minutes to take the initial snapshot.
5 minutes!
And the best thing: backups are INCREMENTAL. Means that from now on, only changes will be stored in the next snapshots.
Oh yeah, and the snapshots folder is distributed. That means every single one of my crate nodes is mounting the same remote folder in the storage box and crate handles the complexity of syncing snapshots over all shards.
I love this project so much. Can't imagine PostgreSQL ever coming even close to this DX.
Vector database: Qdrant
For Qdrant, backups are easy to create as well. Just fire up your favorite language and make an API request:
requests.post(f"{QDRANT_HOST}/collections/{collection_name}/snapshots?wait=true", headers=HEADERS)
Again, the snapshot folder is mounted to the storage box via docker compose.
10 GB of vectors, takes 1 minute.
These are not incremental, therefore we need a good ...
Backup Retention Strategy
Make sure that you have enough backups, without overflowing your storage.
My strategy is keeping the 3 last hourly, 3 daily, 3 weekly backups.
Cronjobs
I run the backups hourly. For CrateDB that's a breeze and I could even do it more often because it does not cause any downtime. For Qdrant it's maybe even too frequent, but it doesn't hurt.
I set up cronjobs via ansible:
- name: Configure Backup Cron Job for CrateDB
hosts: db1
become: yes
tasks:
- name: "Cron: CrateDB Backup (Hourly at :05)"
cron:
name: "CrateDB Backup"
minute: "5"
hour: "*"
job: "/root/.local/bin/uv run /root/goodwatch/goodwatch-monorepo/goodwatch-crate/backup.py >> /var/log/backup_crate.log 2>&1"
Oh, and one more thing...
DO YOUR BACKUPS. NOW.
If that inspired you, leave an emoji worthy of a backup and subscribe