Differential Backups using Git Bundles

There are a lot of self-hosted file and data backup solutions out there, most of which are clunky to set up and configure correctly. Many simply tar your whole directory and let you download a huge archive. Others will store snapshots “in the cloud”.

I like minimal, self-contained solutions. One excellent tool is, of course, rsync, which offers incremental file transfers, which is pretty neat, and saves space by only saving changed files from last checkpoint. This type of backup is usually referred to as incremental backsup. For your media collection or user file uploads this is great. But space can be saved even more if most of the changes are inside the files. This is where differential backup comes in. rsync doesn’t do differential backups. Moreover, there seems to be no straightforward access to history, diffs, etc.

Differential Backups using Git

“history”? “diffs”? Sounds like version control…

Exactly! There are a couple of git-based backup utilities out there, like gup, for example. The problem with this is that you still have to use an incremental file transfer to send or get the changes. Git does this by default, but how do you git push to something like Google Drive? You could use rsync perhaps and a “cloud”-synced local directory (grive for Google Drive, for example). The idea is to incrementally backup a bare (and compressed) git repository from one state to the next.

The problem? Doesn’t seem to be self-contained, too many files (objects, refs). What if we could archive up only the changed objects and refs into one neat package and send it over to wherever we want?

It’s our lucky day – meet git-bundle, a built-in utility to move objects and refs around. git bundle produces a single compressed bundle file that contains the necessary data to rebuild part of the backup history.

The basic flow would be the following:

  1. create an empty repository
  2. fill it up with backup data, dumps, etc.
  3. run git add -A and git commit -m "backup"
  4. then git bundle backup.bundle $LAST..HEAD
  5. save the bundle in backup storage
  6. go to 2

Simple, right? You just need to remember the $LAST ref that was bundled.

We eventually end up with a lot of bundles (hopefully dated and sortable). Assembling them back up is quite simple as well. Create a new repository with git init, then find /my/backups/ -name '*.bundle' | sort | xargs -n1 -I'{}' git pull {} master. This would pull in each bundle. NOTE: they do have to be sortable by date or something, they have to be pulled in order of creation.

Now these bundles can be backed up quite easily. Of course, if you lose a bundle, bad things will happen for everything in history after the lost bundle. Fault tolerance can be added by recreating the repository from scratch every so often to start a new history. And since this is git, you have all the tools git ships with in order to move through history, analyze at diffs and changesets, and even bisect.

So, where’s the code? Right here: https://github.com/soulseekah/bundleup.

I’ve written a simple utility in Bash. It’s fairly straightforward to deploy and use. Configuring backup logic is done using shell commands. Check out the README and try it out. Pull requests and improvements are welcome.

Again, differential backups are meant for code, database dumps, configs and other non-binary data. Backing up your media library is best done using other solutions, not git bundles.

How do you automate your backups?