Tip for handling an unplanned outage: Protect your family jewels

In a previous post, I took a little dig at Tekpub.  Someone thought I was ‘cheering’ this misfortune.  This is silly.

Site outages are something that should be part of any revenue generating site’s operational plan/business model.  I’ve probably dealt with this at least a dozen times (I personally have caused three of them myself).  Now that Rob Conery has transitioned from a (very successful from all accounts) career where he did a whole bunch of ‘architecture astronaut’ to running an (very successful from all accounts) actual business, it is nice to see that he has some experience in dealing in the real world, and hopefully he will learn from it and perhaps learn that he is not beyond all criticism.

It was perfectly clear that other than the panic of dealing with the outage that Tekpub and its customers would suffer very little harm from this entirely foreseeable event.

Having said that, if you are the owner or operator of a revenue generating site, there is something that can help you plan for the entirely foreseeable event of a site outage.

Identify your family jewels

In many, but not all, cases, it will be your database (it could be your source code).  What is your strategy for protecting your database from catastrophe and/or making it available at another site.  Back in the day when I was managing an at the time important dot com, we had various strategies for dealing with the possibility of our data center becoming unavailable, even permanently.

Since we used SQL Server, we implemented, among other things, log shipping to an external site.  This meant that we had an ‘within 5 minutes’ state of our database, including customer information, order information, and all that important stuff.

The trend towards cloud computing doesn’t mean you don’t have to plan for this.  All of the major providers of cloud computing have had site outages.  More importantly, companies that you would never think would experience something like a bankruptcy (think Enron or Arthur Anderson or Lehman Brothers) could render your non-locally hosted data to become unavailable, perhaps permanently.  You really have to plan for this.  It’s hard to imagine an Amazon or a Github suddenly disappearing, but it could happen (though because of the inherently distributed nature of git that it would be that big of a deal), and it is incumbent for a business owner to plan for it when it is relatively easy to do.

Protect your family jewels

Ways that you can handle this involve backing up locally (so you now have a cloud hosted version of your data, and a slightly lagging local version of your data).  Since cloud computing is so cheap, you can then backup the slightly lagging local version of your data to a 2nd cloud.  You don’t have to be an Amazon or Google to afford this.

This won’t necessarily bring your site back online quickly.  Recreating a production infrastructure in multiple places is much more cost prohibitive.  But, you should still have it as part of your business model to plan for it.  If your site is down for, say, three days, how do you handle it?  Think of it in terms of compensating actions a la CQRS.

This is something that should be part of what you do when running an online business.  Actually, if you are the sort of person that cares about, e.g. family photo history, you should probably plan on that as well.

posted on Friday, April 22, 2011 9:11 PM Print
No comments posted yet.

Post Comment

Title *
Name *
Comment *  
Please add 4 and 7 and type the answer here: