The Gitlab database incident and the Backup Checker project

The Gitlab.com database incident of 2017/01/31 and the resulting data loss reminded everyone (at least for the next days) how it’s easy to lose data, even when you think all your systems are safe.

Being really interested by the process of backing up data, I read with interest the report (kudos to the Gitlab company for being so transparent about it) and I was soooo excited to find the following sentence:

Regular backups seem to also only be taken once per 24 hours, though team-member-1 has not yet been able to figure out where they are stored. According to team-member-2 these don’t appear to be working, producing files only a few bytes in size.

Whoa, guys! I’m so sorry for you about the data loss, but from my point of view I was so excited to find a big FOSS company publicly admitting and communicating about a perfect use case for the Backup Checker project, a Free Software I’ve been writing these last years.

Data loss: nobody cares before, everybody cries after

Usually people don’t care about the backups. It’s a serious business for web hosters and the backup team from big companies but otherwise and in other places, nobody cares.

Usually everybody agrees about how backups are important but few people make them or install an automatized system to create backups and the day before, nobody verifies they are usable. The reason is obvious: it’s totally boring, and in some cases e.g for large archives, difficult.

Because verifying backups is boring for humans, I launched the Backup Checker project in order to automatize this task.

Backup Checker offers a wide range of features, checking lots of different archives (tar.{gz,bz2,xz}, zip, tree of files and offer lots of different tests (hash sum, size {equal, smaller/greater than}, unix rights, …,). Have a look at the official documentation for a exhaustive list of features and possible tests.

Automatize the controls of your backups with Backup Checker

Checking your backups means to describe in a configuration file how a backup should be, e.g a gzipped database dump. You usually know about what size the archive is going to be, what the owner and the group owner should be.

Even easier, with Backup Checker you can generate this list of criterias from an actual archive, and remove uneeded criterias to create a template you can re-use for different kind of archives.

Ok, 2 minutes of your time for a real word example, I use an existing database sql dump in an tar.gz archive to automatically create the list describing this backup:

$ backupchecker -G database-dump.tar.gz
$ cat database-dump.list
[archive]
mtime| 1486480274.2923253

[files]
database.sql| =7854803 uid|1000 gid|1000 owner|chaica group|chaica mode|644 type|f mtime|1486480253.0

Now, just remove parameters too precise from this list to get a backup template. Here is a possible result:

[files]
database.sql| >6m uid|1000 gid|1000 mode|644 type|f

We define here a template for the archive, meaning that the database.sql file in the archive should have a size greater than 6 megabytes, be owned by the user with the uid of 1000 and the group with a gid of 1000, this file should have the mode 644 and be a regular file. In order to use a template instead of the complete list, you also need to remove the sha512 from the .conf file.

Pretty easy hmm? Ok, just for fun, lets replicate the part of the Gitlab.com database incident mentioned above and write an archive with an empty sql dump inside an archive:

$ touch /tmp/database.sql && \
tar zcvf /tmp/database-dump.tar.gz /tmp/database.sql && \
cp /tmp/database-dump.tar.gz .

Now we launch Backup Checker with the previously created template. If you didn’t change the name of database-dump.list file, the command should only be:

$ backupchecker -C database-dump.conf
$ cat a.out 
WARNING:root:1 file smaller than expected while checking /tmp/article-backup-checker/database-dump.tar.gz: 
WARNING:root:database.sql size is 0. Should have been bigger than 6291456.

The automatized controls of Backup Checker trigger a warning in the log file. The empty sql dump has been identified inside the archive.

A step further

As you could read in this article, verifying some of your backups is not a time consuming task, given the fact you have a FOSS project dedicated to this task, with an easy way to realize a template of your backups and to use it.

This article provided a really simple example of such a use case, the Backup Checker has lots of features to offer when verifying your backups. Read the official documentation for more complete descriptions of the available possibilities.

Data loss, especially for projets storing user data is always a terrible event in the life of an organization. Lets try to learn from mistakes which could happen to anyone and build better backup systems.

More information about the Backup Checker project

 

 

2 thoughts on “The Gitlab database incident and the Backup Checker project

  1. I’ve been thinking about writing something like Backup Checker, but perhaps you’ve obviated the need.

    Is there any chance Backup Checker could reasonably be made to check http://stromberg.dnsalias.org/~strombrg/backshift/ ?

    Or is it too different? http://stromberg.dnsalias.org/~strombrg/backshift/documentation/for-all/restoring.html
    Restoration with backshift is a matter of asking the program to assemble a tar archive on the fly, and piping to tar.

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *