https://www.henrik.org/

Blog

Monday, July 17, 2023

Why I created Underscore Backup

I started running a server for storing all my projects as well as various multimedia artifacts in 1999 with a small desktop computer and a 20GB HDD. As the size and personal importance of this server grew within a few years I started running RAID5 and then RAID6 to make sure data was not lost from single drive failures. Despite this, in 2006 the current incarnation of this server encountered a catastrophic 3-drive failure which I only managed to recover from after a tremendous amount of work and a fair amount of luck which included among other things manually patching the Linux RAID kernel code to remove certain fail-safes as I pulled data off the partially assembled RAID.


"The Server" in its current iteration.

This episode led me to look for ways to safeguard against this ever happening again. Looking through what options were available to me I found Crashplan which did address all my needs at a reasonable price. My initial backup to Crashplan took several years to complete over my 20mbit/s broadband uplink as my server had at this point grown to several TB.

A few years after I started using Crashplan they stopped offering consumer backups and the only way to keep using them was to migrate to their business plan which I did. However, Crashplan only allowed you to migrate a few TB per computer at the time which meant that I had to re-upload most of my backup again. Fortunately, at this point, I had gotten a fiber internet connection with a reasonable uplink that allowed me to re-upload this data in less than a year. As my backup of this server grew Crashplan also started showing its flaws where it required several GB of memory to be able to back up my server, but it did work and allowed me a reasonable peace of mind for the contents of my server.

This went on for a few years after which I was contacted by Crashplan (Now called Code42) and told that unless I reduced the size of my backup to under 10 TB, they would terminate my account since they considered me violating their terms of service by keeping too large a backup.

From: Support Ops (Code42 Small Business Support) 
Date: Feb 6 2020, 10:38 AM CST 

Hello Administrator,

Thank you for being a CrashPlan® for Small Business subscriber. We appreciate the
trust that you have placed in CrashPlan - that relationship is important to us.
Unfortunately, we write to you today to notify you that your account has
accumulated excessive storage, which will result in degraded performance. You 
have one of the largest archives in the history of CrashPlan. It is so large, we
cannot guarantee the performance of our service. Due to the size of your
archive, full restores of your backup archive, and even selectively restoring 
specific files, may not be possible.

As a result, we are notifying you, per our Master Service Agreement and
Documentation, to re-duce your storage utilization for each device to less than
10TB by June 1, 2020. Note that we have extended your subscription to June 1, 
2020 to give you ample time to make changes. If you do not do so by June 1, 
2020, your subscription will not be renewed, and your account will be closed at
the end of your current subscription term.

…

Thank you, 
Eric Wansong, Chief Customer Officer, Code42

The server I was using was Linux based and as far as I could tell Crashplan was the only competitor on the market providing cloud-based backup solutions for that OS. This was when I decided to start working on Underscore Backup as a means for me to continue making backups of my server as I couldn’t find any existing alternatives that fulfilled my needs. The first version was command line only and very primitive even though it did support point-in-time recovery, backup sets as well as obviously efficiently handling my very large backup. Another feature that was built in from the beginning was a strong focus on encrypting everything as much as possible so that any medium could be used for backups even if it was not properly secured from prying eyes. Creating the initial backup of my server using Underscore Backup used a more or less sustained 600mbit/s (To be compared with the at the time impressive 60mbit/s that I experienced using Crashplan on the same connection).

At the same time, I also started using the iDrive service for backing up my laptops and various other smaller Windows and MacOS based machines. I did this because I didn’t think the CLI (Command Line Interface) only implementation of Underscore Backup was just not convenient enough to be used on these machines). This situation continued for a few years when the CLI-only version of Underscore Backup backed up my server data to cloud block storage and my other machines were backed up by the iDrive service. This all came crashing down when my main development laptop of several years had a catastrophic SSD failure and I had to restore my data from iDrive. I found out two things about how the iDrive service works.

The first is that even though iDrive keeps track of versions of your files they do not keep track of directory contents and deletions of files. This is critical to any developer, and I restored a large developer repository with files that I have been working on as I have been running iDrive in the background. For those of you who are not developers, we rename files a lot. And every one of the old names of all my renamed files was restored back when I did a full restore of the contents of my laptop’s hard drive. That meant, that any repository of code that I had basically worked on since I started using iDrive was no longer in a buildable state without a considerable amount of work.

The second surprise to me was that even though to me the iDrive backup of my laptop was relatively small, only around 50GB in size it took almost 2 weeks to restore. Granted it contained a large number of files (Around 3 million, mostly small, files) but I was shocked at the slowness of its performance. I also opened up several support cases with iDrive about this but it was nothing they could do to help me. For comparison, on the same network with roughly the same sized backup in both files and total storage Underscore Backup would complete a similar restore in about 5 minutes (And it would do it properly keeping track of deleted files).

At this point, I evaluated other solutions available but could not find any that would be suitable for my needs. Carbonite does not allow you to specify what files should be backed up but instead in the interest of simplicity tries to be smart about it, when I tried it on my development files it decided to back almost none of them even though I specifically said to include the directory. Backblaze is a very solid solution but also does not keep track of deleted files for a true point-in-time recovery same as iDrive. In the end, I decided that I would put in the effort needed to create an easy-to-use user interface for Underscore Backup so that it would be suitable for use on things other than servers. The end result of these efforts was the first stable release of Underscore Backup in the summer of 2022 and which at that point graduated to be the only backup solution I used on all my computers.

The problem at this point though was that even though I had a backup solution that fulfilled all my needs it was still very tricky to set up for most users since to use it you generally had to supply your own cloud storage such as Amazon S3. It was also quite tricky to access data from other sources you had backed up since every source had to be set up individually on each client you wanted to restore the source on. The sharing functionality, even though present was also so complicated that I am relatively certain nobody managed to set this up except for myself. To solve all of these problems I decided to leave the service-less nature of the software I had followed up until that point and create a service to both remove the need to provide separate cloud storage and also help manage multiple sources and set up shares. This was a relatively large undertaking, but it eventually led to the launch of Underscore Backup 2.0 in the first half of 2023.

This current release as of this writing is the upcoming 2.2 release which has made it very easy to set up backup of multiple computers of any size while staying true to the original guiding principles of security, durability, efficiency, and flexibility.

This post was cross posted from the Underscore Backup blog.

No comments: