Posts
1145
Comments
890
Trackbacks
1
April 2011 Blog Posts
Some more on the AWS Outage

SitePoint had an interesting take in its newsletter called “Two Important Lessons from the AWS Failure.” 

The first involves communication:

“The lesson here is clear—when you have any kind of crisis, communication with those affected is extremely important. In emergency mode, it may not be possible to pick up the phone to talk to a client or customer, but updating your website or changing the voicemail message can have a major impact.”

Tekpub was able to provide communication through Rob’s blog (I’m also assuming that since he rejoined Twitter, he also used that medium, but since I don’t use Twitter, that’s an assumption.  I would also not be surprised if he did an email blast to his customer base once he recognized the seriousness of the issue).  I’m assuming their customers knew about these communication channels, if not, I have little doubt that they will in the future.  Having been bitten by these sorts of things in the past, when you figure out lessons learned, you can really learn a lot.

The second point, which I talked about previously here involves having a contingency plan:

“As I was hearing of some extremely large websites being completely down due to the AWS outage, I couldn't help wondering why they built their systems without any redundancy or backup plan. Cloud computing is a relatively young industry, and although Amazon Web Services has been very reliable, failures happen.”

and

“One of the biggest advantages of cloud computing is its rapid scalability. It is entirely possible to setup two completely separate cloud environments, one at AWS and one at Rackspace for instance, and simply have one be a backup ready to be scaled up to production when a failure occurs (either manually or automatically).”

This wouldn’t address any potential data loss (which is what is scary about the AWS outage, some data was simple unrecoverable.  Ouch.), but would allow you to get back online quickly.  As always, though, cost is a factor.

ArsTechnica (among many others) gives a good summary of what happened:

“One factor contributing to the problems was that when nodes could not find any further storage to replicate onto, they kept searching, over and over again. Though they were designed to search less frequently in the face of persistent problems, they were still far too aggressive. This resulted, effectively, in Amazon performing a denial-of-service attack against its own systems and services. The company says that it has adjusted its software to back off in this situation, in an attempt to prevent similar issues in the future. But the proof of the pudding is in the eating—the company won't know for certain if the problem is solved unless it suffers a similar failure in the future, and even if this particular problem is solved there may well be similar issues lying latent. Amazon's description of the 2008 downtime had a similar characteristic: far more network traffic than expected was generated as a result of an error, and this flood of traffic caused significant and unforeseen problems.

Such issues are the nature of the beast. Due to their scale, cloud systems must be designed to be in many ways self-monitoring and self-repairing. During normal circumstances, this is a good thing—an EBS disk might fail, but the node will automatically ensure that it's properly replicated onto a new system so that data integrity is not jeopardized—but the behavior when things go wrong can be hard to predict, and in this case, detrimental to the overall health and stability of the platform. Testing the correct handling of failures is notoriously difficult, but as this problem shows, it's absolutely essential to the reliable running of cloud systems.”

One of the biggest selling points of cloud computing is the promise of cheap and scalable hosting.  When I was part of the group that supported eCommerce sites for the NBA, NASCAR and NHL, our infrastructure was easily six figures for hardware alone.  I don’t know about you, but I don’t have that kind of coin.  The problem is you have no control over failures.

For a comprehensive list of links related to the outage, check out HighScalability.

posted @ Friday, April 29, 2011 6:44 PM | Feedback (0)
cqrs for dummies – example – comments on an incredibly simple implementation of the command part of cqrs

In a previous post, I laid out an incredibly simple implementation of the command part of cqrs.  I specifically left out any comments, so consider this the planned addendum to that post.

Why a code sample/example and why now?

As a general rule, on my blog, I tend to explicitly refrain from posting code.  Why?

It varies from day to day, but I tend to read between 50-100 blog posts a day (note to self: there’s a reason why you don’t get enough done in a day.  Address.).  Among the bloggers I read daily include Ayende.  Now, as I have mentioned before, my ego is large enough to be seen from space.  Regardless of that, it is no threat to my ego to recognize that I will never be as good as a developer (however that might be determined) as Ayende.  I think I’m pretty good, but I have, at least once, had to email him when he posts a code challenge to tell me which user comment had the right solution, because, well, the dude’s really good, and I don’t always get which one is correct. 

If I dropped everything and decided to become as good a developer as Ayende (however that might be determined)….yeah, it’s still not going to happen.  I accept my limitations here.

Everyone who’s a developer (and I’m willing to bet Ayende has done it at least once) has faced a challenge with some programming task, and gone on-line and searched Google (sorry Bing, love your commercials though).  I don’t do it everyday, or even every week, but I bet I have done this at least once a month (on average) since….1998?  Did Google exist back then?

If you post code, there’s the chance that someone will Google your code before you die, and then ask why their kitten died because you didn’t foresee some random case that has almost no relation to what you were posting.  That’s your problem, protect your kittens.

The single reason why I posted this particular sample is because after reviewing every sample CQRS framework that I could find, they didn’t include one thing I wanted to do.

I didn’t include every single part of the implementation.  Why?

I didn’t include implementations of the commands or the handlers or the simple SqlCommandStore.  Why?

Nothing in those pieces are remotely interesting.  And laziness.

I have ICommand<TReturnValue> but it is never used.  Why?

I haven’t determined to my 100% satisfaction what should be responsible for determining if a command should have a return value.  Is it the command itself, or the command handler?

I spent a considerable amount of time having the command be responsible.  Problem is, I could never get it to work to my satisfaction.  Combine the rational explanation that it is properly the responsibility of the handler with the fact that it works, and there you go.  But, I left it in to highlight the fact that this is a concern and I’m willing to change my mind.

Also, see above about why I don’t normally post code examples.

Everybody knows commands shouldn’t have a return value.  Isn’t this a mistake?

Technically, it is the command handlers that can have a return value, and the command bus that allows it.  Whatever.  I assume the reader can figure it out.

Okay, this is the main reason why I posted the code.  There are multiple reasons why I did this, which intersect in various ways, and so as such, are hard to describe in a linear fashion.  I could explain it through interpretive dance, but no one wants to see that.  So, let me try.

The typical way (from what I’ve seen, anyway) in which CQRS is implemented involves creating queries that always have a return value and creating commands that never have a return value. 

As a result of having commands that have no return value, you need to figure out how to figure out if a command is successful.  This is complicated.  There are many ways to do this, none of them that I find satisfactory at the moment for the applications that I deal with.  Additionally, this is a requirement that usually isn’t required for the applications that I deal with.  I don’t want to commit to a requirement I don’t need unless I need to.  If I ever do need to commit to that requirement, I can strip out of the interface that commands can have a return value and then refactor accordingly.

Additionally, Derick Bailey had a post about Request/Reply that made a lot of sense to me (the careful reader will note what I said above about using Google).  I wanted that functionality in my code.  However, Request/Reply is ambiguous when it comes to ‘strict’ CQRS.  Is it a command or a query that wants a reply?  Obviously a query does, but if your code says ‘Request/Reply’, it might not be obvious that it is a query.  So, why not just make it explicit that when you have a command that you want a return value, that’s exactly what it is?

So, that’s what I did.  I made it explicit.   A command still tells the system that it wants to change something (as opposed to a query, which tells the system you only want to return data), it additionally requests that it gets a reply with whatever it needs.

It isn’t CQRS if you do that

Okay.  I’m fine with that.

Everyone knows that while an event bus might publish, a command bus sends.  Isn’t this a mistake?

Though I haven’t implemented the functionality behind this semantic difference, no, it isn’t a mistake.

Since I am implementing a command store, this means that I might have a generic command handler that handles all commands (in order to add them to the command store, for instance) and then the specific handler that ‘fulfills the request’ of the command (which will normally produce the events that result from it, etc.).

You wouldn’t want to allow anything to subscribe to a command as if it were an event.  That would be a mistake.

Anything else?

I don’t think so.

posted @ Tuesday, April 26, 2011 7:18 PM | Feedback (0)
cqrs for dummies – example – an incredibly simple implementation of the command part of cqrs

Without comment, the interfaces:

namespace Infrastructure.Command
{
    public interface ICommand {}

    public interface ICommand<TReturnValue> {}

    public interface Handles<T> where T : ICommand
    {
        void Handle(T command);
    }

    public interface Handles<T,TReturnValue> where T: ICommand
    {
        TReturnValue Handle(T command);
    }

    public interface ICommandBus
    {
        void Register<TCommand>(Handles<TCommand> handler) where TCommand : ICommand;
        void Publish<TCommand>(TCommand command) where TCommand : ICommand;
        void Register<TCommand, TReturnValue>(Handles<TCommand, TReturnValue> handler) where TCommand : ICommand;
        TReturnValue Publish<TCommand, TReturnValue>(TCommand command) where TCommand : ICommand;
    }

    public interface ICommandStore
    {
        void Store(ICommand command);
    }
}

Without comment, an implementation:

using System;
using System.Collections.Generic;
using Infrastructure.Command;

namespace AdminLoader.Commands
{
    public class InProcessCommandBus : ICommandBus
    {
        private Dictionary<Type, object> _handlers = new Dictionary<Type, object>();
        private ICommandStore _store;

        public InProcessCommandBus()
        {
            _store = new SqlCommandStore();
        }

        public void Register<TCommand>(Handles<TCommand> handler) where TCommand : ICommand
        {
            IList<Handles<TCommand>> handlers = GetHandlers<TCommand>();
            handlers.Add(handler);
        }

        public void Publish<TCommand>(TCommand command) where TCommand : ICommand
        {
            _store.Store(command);
            IList<Handles<TCommand>> handlers = GetHandlers<TCommand>();
            if (handlers.Count == 0) throw new ApplicationException("No handlers exist for command of type " + command.GetType());
            foreach (var commandHandler in handlers)
            {
                commandHandler.Handle(command);
            }
        }

        private IList<Handles<TCommand>> GetHandlers<TCommand>() where TCommand : ICommand
        {
            Type commandType = typeof(TCommand);
            object untypedValue;
            if (!_handlers.TryGetValue(commandType, out untypedValue))
            {
                untypedValue = new List<Handles<TCommand>>();
                _handlers.Add(commandType, untypedValue);
            }
            return (IList<Handles<TCommand>>)untypedValue;
        }

        private IList<Handles<TCommand, TReturnValue>> GetHandlers<TCommand, TReturnValue>() where TCommand : ICommand
        {
            Type commandType = typeof(TCommand);
            object untypedValue;
            if (!_handlers.TryGetValue(commandType, out untypedValue))
            {
                untypedValue = new List<Handles<TCommand, TReturnValue>>();
                _handlers.Add(commandType, untypedValue);
            }
            return (IList<Handles<TCommand, TReturnValue>>)untypedValue;
        }

        public void Register<TCommand, TReturnValue>(Handles<TCommand, TReturnValue> handler) where TCommand : ICommand
        {
            IList<Handles<TCommand, TReturnValue>> handlers = GetHandlers<TCommand, TReturnValue>();
            handlers.Add(handler);
        }

        public TReturnValue Publish<TCommand, TReturnValue>(TCommand command) where TCommand : ICommand
        {
            _store.Store(command);
            IList<Handles<TCommand, TReturnValue>> handlers = GetHandlers<TCommand, TReturnValue>();
            Handles<TCommand, TReturnValue> handler = handlers[0];
            return handler.Handle(command);
        }
    }
}

And:

namespace AdminLoader.Commands
{
    public class InProcessAdminLoaderCommandBus : InProcessCommandBus
    {
        public InProcessAdminLoaderCommandBus()
        {
            Register(new CreateNewUserWithAppAndRoleCommandHandler());
            Register(new DeleteExistingUserCommandHandler());
        }
    }
}

posted @ Monday, April 25, 2011 7:28 PM | Feedback (1)
Broken Category Show All Links

A reader contacted me to let me know that the show all functionality for my blog categories is broken.  Thus:

http://www.blogcoward.com/category/15.aspx?Show=All

returns zero posts.

There’s a reason for this.  That page produces an “unbounded result set.”  Since my blog is on a shared, publicly hosted infrastructure, in order to prevent traffic from my blog from taking down other sites, I decided to make the settings “safe by default” and return nothing.

HAHAHAHAHAHAHA!!!!!!  Try the veal, tip your waitresses.

Oy.  Seriously, I’ll have to check to see what is wrong in the source code.  Because of the odd implementation of my SubText engine, I probably don’t have a proper version of some code or sql.

posted @ Monday, April 25, 2011 6:05 PM | Feedback (2)
cqrs for dummies – an interlude – has it all been a waste of time?

Now that I have completed the long project that has sucked up a huge amount of time (which was necessary because it was important to the client, and because it pays the bills), I have returned much of my attention to one of my own projects, which might hopefully help pay some of the bills down the road and which involves an oddly opinionated version of CQRS.

As it happens, Udi Dahan has written a post entitled “When to avoid CQRS” that suggests I shouldn’t be putting in the effort.  Some of the key snippets here:

“It looks like that CQRS has finally “made it” as a full blown “best practice”.  Please accept my apologies for my part in the overly-complex software being created because of it.”

“Most people using CQRS (and Event Sourcing too) shouldn’t have done so.”

“Therefore, I’m sorry to say that most sample application you’ll see online that show CQRS are architecturally wrong.”

“So, when should you avoid CQRS?  The answer is most of the time.”

Given all of that, it would seem that, for the most part, putting in all this effort has been a mistake.

Rinat Abdullin then posted a response that suggests perhaps things aren’t so grim.  Some of the key snippets here:

“In essence, synergies within CQRS work, whenever you need to:

  • tackle complexity;
  • distribute teams;
  • scale out under massive loads.”

“Yet, for some strange reason, the mental model of CQRS provided much friendlier and faster implementation route here.”

So, what should we make of this?

My response is going to be that both Udi and Rinat are correct.  To see why seemingly disparate viewpoints could both be correct, you have to consider context.

Isn’t CQRS supposed to be about simplicity?

That’s a bit of an over-statement, but think about what Greg Young has talked about in terms of what I call ‘strict CQRS’.  It’s about separating your commands from your queries in your code.  That’s it.  CQRS doesn’t mean event sourcing, in particular.

Think about the whole concept of a thin view layer as Udi has described.  No translating through your domain model, no mapping of Domain objects to DTOs, or any of that stuff, just reading a simple view model (where your queries don’t even have to go through a layer of any kind, just straight from your denormalized data source).  That’s it.

A lot of the complexity in CQRS frameworks comes in when you try to implement Event Sourcing and re-creating an AR off of the event store.  But what if you take some of the simple principles of CQRS and apply (pun intended) only those, before you embrace full blown Event Sourcing architectural principles?

An EventLog doesn’t have to be gold-plating

Udi says:

“Architectural gold-plating / stealing from the business.

Who put you in a position to decide that development time and resources should be diverted from short-term business-value-adding features to support a non-functional requirement that the business didn’t ask for?

If you sat down with them, explaining the long-term value of having an archive of all actions in the system, and they said OK, build this into the system from the beginning, that would be fine. Most people who ask me about CQRS and/or Event Sourcing skip this step.”

Imagine if you build a system that has some of the architectural principles of CQRS but doesn’t try to implement full-blown Event Sourcing?  You can still have your commands being handled (and saved to a command store) and generating events (which are saved to an event store) which are published and subscribed to, and dealt with accordingly, but the state of your domain objects are saved in a traditional, boring database.

The infrastructure for such a system can be pretty simple (more or less).   Your event store can still give you a (mostly for) free log that can still give you insights into the history of how your application is used.  It doesn’t, obviously, let you replay events to rebuild a system from some particular point in time.  You do need to build additional infrastructure for that.

But then is it still CQRS?

Yes and no.  You can argue semantics forever, and if you want to do that, you can.  But the idea I’m expressing here is that you can gain a lot of benefits from applying some of the basic/strict CQRS principles in a low cost manner.

I don’t know that anyone has ever claimed this, but in my mind, when I think of a domain where full blown CQRS + Event Sourcing makes sense is in a full-blown, real-time trading system.  A lot of what I consider to be basic/strict CQRS is probably just coming to terms with understanding basic messaging concepts.

CQRS is a helpful mental model

Rinat says:

“Yet, for some strange reason, the mental model of CQRS provided much friendlier and faster implementation route here (despite the fact that there is not a single book published on the subject, yet). Diverse solutions being delivered to production, share similar architecture, development principles, reduced complexity levels and teams. They just work.”

Some of what I’ve taken from in learning what people have been talking about with CQRS is probably really just getting down to brass tacks and learning how to apply SRP.

Every action in my application that changes state is a command, and that command completely consolidates all of the information needed for the state to be changed.  It doesn’t include any other information.

Every request in my application for data is a query, and that query completely consolidates all of the information needed to get the data that I need.  It doesn’t include any other information.

From just thinking about these two things, I learn quite a lot about what to do with the infrastructure of my application.  I don’t need to create a domain object that then gets transformed for every command or query I need to produce.  Once I have simple commands and queries, I can do a lot of pretty simple things without having to build a really complicated infrastructure.

In the long project I just completed, I did something like that. I had commands and I had queries. I didn’t have events and I didn’t even implement command handlers, but instead had each ICommand implementation implement an Execute method. One could argue that what I did was simply implement ‘bad’ Transaction Script procedural code, but you know what, I’m fine with that. The system (which was an entirely back-end system with no UI) is now such that, if there is a failure, it occurs within the single Execute method of a single ICommand implementation.

Is this really CQRS?  You could very easily argue that, no, it is not.  But, I’m fine with that.  It used principles and concepts from CQRS that helped make the existing application better, and that met the need that I had with application in question.

Should you avoid CQRS?

It is completely unfair, and, technically speaking, rather inaccurate, but I would recast this question in large part to asking whether you should avoid Event Sourcing.  The answer to that is, usually yes.  Think of commands (and the events they produce) and queries as messages, and then learn better ways of dealing with messaging.

To take another route, if you do try to go the more ‘advanced’ CQRS framework route, ask yourself a basic question.  Suppose you avoided some of the intricate problems you can get when you submit commands that you don’t know for sure will succeed, and so you don’t know for sure how to handle letting the caller (be it a UI element or not) handle when it fails down the road (so to speak).  Why not make the whole process be synchronous?  A command is created, sent, and handled, and the resulting events from that command are then published.  Technically, you don’t know for sure that once an event is published that the subscriber you care about picks it up and processes it.  So, as a thought experiment, why not have the calling code that creates and sends the command subscribe to the event publisher that tells you it worked? 

If that’s too complicated, as it could be, why not set up your command handler infrastructure to allow you to create command handlers that have a return value?  This is something that ‘breaks orthodoxy’ that I’ve had some success with.

It hasn’t been a waste of time

As with everything else in software development, you have to use judgment.  I am very ‘unorthodox’ in the sense that I don’t think it is a requirement that you be a software craftsman, or use TDD, or even have unit tests, in order to create successful software. 

When it comes to CQRS, though I attempt to understand all of the most sophisticated concepts involved in the most sophisticated expressions of it, I pick and choose the pieces that I think are the easiest and most productive.  Part of this is simply understanding my own limitations, where I know I’ll never be the greatest software developer in the world.  Part of this is simply understanding the limitations of the applications that I deal with, where I know that e.g. Event Sourcing is something that is genuinely overkill or gold-plating.

I still think that learning CQRS in toto can help you greatly as a developer to find ways to make your applications better.

And I don’t think you need any justification other than that.

posted @ Sunday, April 24, 2011 11:17 PM | Feedback (11)
Steak

For Lent, I normally give up lima beans, but since I generally hate the little fuckers, it has never really fit into the whole idea of sacrifice.  So, this year, I decided to actually give up a few things, which led, among other things, to becoming a pescetarian for 40 days.  And no, I had no idea the term existed, either.

When I stopped being broke (one of the ‘advantages’ of getting a Ph.D. in Philosophy is that when you graduate and try to be an adjunct professor, your income tends to top out in the $20k range, at least that is what it was back in the day), one of the first things I did was start to eat better, and by better, I mean more expensively.  Not much in terms of going out per se, since I’m an avid home cook, but definitely in terms of better and more expensive product.  And my number one product of choice was steak (though in the ensuing years I’ve had to cut down because of that damn cholesterol thing).

And let’s face it, there isn’t much better way than to end a day celebrating the Resurrection with a nice dry-aged, spice-rubbed New York Strip and some homemade chimichurri sauce.  And a glass or two of a nice Cabernet.

posted @ Sunday, April 24, 2011 5:10 PM | Feedback (0)
Why did I tweak Tekpub?

In a previous post, I tweaked Rob about Tekpub being affected by the Amazon outage.  Why?

People have suggested a couple of reasons:

- I’m a dick.

- I’m jealous.

- I didn’t do my research to see that other sites were affected.

The first suggestion is undoubtedly true.  The others are false.

Rob is INFLUENTIAL.  He should use his power for good

Rob seemed to drop the ball in not expecting an Amazon failure/outage.  It left him in a state that could have had (in theory) pretty negative ramifications for his business and his customers.  As far as can be seen, it didn’t.  That is a good thing.

Now that Rob has experienced the sort of problem that many of us have experienced over the years more often than we wanted to, it is precisely because Rob is well read and influential, that he can produce a series on site operations either on Tekpub or on his blog that will greatly benefit his readers.

Creating successful software involves many different things.

- software needs to be well written

- software needs to be easily deployable (and potentially, undeployable)

- software needs to be able to handle operational ‘necessities’

Rob has almost always focused on the first item.  Although the actual BDD presentation of his wasn’t technically the greatest in the world, it’s probably one of the most influential ones that I’ve ever watched, because it changed completely how I did software development.

The topic of deployability is pretty complex, and depends so highly on your environment, that Tekpub’s (seemingly) simplistic environment might not be a great base for much discussion.

However, now that Rob has had to deal with the outage of a revenue-generating online site, he’s in a perfect position to explain how he handled it and provide guidance for his readers.

- although his particular implementation is tied to Amazon, he should be able to provide general guidance on how to handle a site outage.

- since Tekpub appears to accept credit card payments (I’ve never needed to signup myself, so I don’t know for sure), he should be able to provide general guidance about how to implement PCI Compliance.  This is a huge topic.

- since Tekpub is a publicly available site, he should be able to provide general guidance about how to avoid common Internet attacks.

Summary

Imagine a “Mastering Site Ownership featuring Tekpub” subscription Rob could produce.  He could do so without revealing any secrets that are business specific.  In doing so, he would provide influential guidance that would benefit a lot of people.

I would offer to assist myself, but we can imagine how well that offer would go down.

With his influence, Rob could even further his assistance to the community if he looked at this as an option.

Just a thought.

posted @ Saturday, April 23, 2011 10:35 PM | Feedback (0)
Go Pens: BOOM-LAY BOOM-LAY BOOM!

As with every team in the NHL, right before the home team Penguins come out at the beginning of the game at CONSOL Energy Center, they play the obligatory video to get the crowd riled up.  “Yeah Home Team!  Boo Road Team!”  Crowd goes wild.

The Pens use a song by Shinedown called Diamond Eyes, which is incredibly catchy and includes lyrics like:

Boom-Lay Boom-Lay BOOM! [x4]

I'm on the front line
Don't worry I'll be fine
the story is just beginning
I say goodbye to my weakness
so long to the regret

Yeah, home team!

So, obviously, after seeing the team live, I bought the song.

Now, I’ve mentioned before that, though there are exceptions, in general when it comes to rock/pop songs, I’m not really looking to the lyrics for anything inspirational or meaningful.  If I’m looking for meaningful words, I’ll read the Bible or Shakespeare.

But, I think it’s a special kind of idiotic to have lyrics like this in a song:

[INTRO-Speaking]
I am the shadow, and the smoke in your eyes,
I am the ghost, that hides in the night

…..

[BRIDGE]
Every night of my life
I watch angels fall from the sky
Every time that the sun still sets
I pray they don't take mine

Um, right.  Let me get a pencil and write that down.

But, hey, what the hell, it’s my team’s song, and it’s really catchy, so:

Go Pens, BOOM-LAY BOOM-LAY BOOM!

Enjoy.

posted @ Friday, April 22, 2011 11:17 PM | Feedback (0)
Tip for handling an unplanned outage: Protect your family jewels

In a previous post, I took a little dig at Tekpub.  Someone thought I was ‘cheering’ this misfortune.  This is silly.

Site outages are something that should be part of any revenue generating site’s operational plan/business model.  I’ve probably dealt with this at least a dozen times (I personally have caused three of them myself).  Now that Rob Conery has transitioned from a (very successful from all accounts) career where he did a whole bunch of ‘architecture astronaut’ to running an (very successful from all accounts) actual business, it is nice to see that he has some experience in dealing in the real world, and hopefully he will learn from it and perhaps learn that he is not beyond all criticism.

It was perfectly clear that other than the panic of dealing with the outage that Tekpub and its customers would suffer very little harm from this entirely foreseeable event.

Having said that, if you are the owner or operator of a revenue generating site, there is something that can help you plan for the entirely foreseeable event of a site outage.

Identify your family jewels

In many, but not all, cases, it will be your database (it could be your source code).  What is your strategy for protecting your database from catastrophe and/or making it available at another site.  Back in the day when I was managing an at the time important dot com, we had various strategies for dealing with the possibility of our data center becoming unavailable, even permanently.

Since we used SQL Server, we implemented, among other things, log shipping to an external site.  This meant that we had an ‘within 5 minutes’ state of our database, including customer information, order information, and all that important stuff.

The trend towards cloud computing doesn’t mean you don’t have to plan for this.  All of the major providers of cloud computing have had site outages.  More importantly, companies that you would never think would experience something like a bankruptcy (think Enron or Arthur Anderson or Lehman Brothers) could render your non-locally hosted data to become unavailable, perhaps permanently.  You really have to plan for this.  It’s hard to imagine an Amazon or a Github suddenly disappearing, but it could happen (though because of the inherently distributed nature of git that it would be that big of a deal), and it is incumbent for a business owner to plan for it when it is relatively easy to do.

Protect your family jewels

Ways that you can handle this involve backing up locally (so you now have a cloud hosted version of your data, and a slightly lagging local version of your data).  Since cloud computing is so cheap, you can then backup the slightly lagging local version of your data to a 2nd cloud.  You don’t have to be an Amazon or Google to afford this.

This won’t necessarily bring your site back online quickly.  Recreating a production infrastructure in multiple places is much more cost prohibitive.  But, you should still have it as part of your business model to plan for it.  If your site is down for, say, three days, how do you handle it?  Think of it in terms of compensating actions a la CQRS.

This is something that should be part of what you do when running an online business.  Actually, if you are the sort of person that cares about, e.g. family photo history, you should probably plan on that as well.

posted @ Friday, April 22, 2011 9:11 PM | Feedback (0)
YAGNI applies to testing as well

Suppose you have a class related to inventory with a method that takes in a quantity, and as such, that quantity cannot be negative.  Should you create a test that proves the method throws an exception (or however you think the method should behave) when a negative quantity is passed to it?

<digression>Note that it is very easy to slip into a mistake of asking what ‘the right thing’ to do is, as if there has to be a single correct answer.  ‘The right thing’ often depends on the context.  Universal truths (related to morality anyway) are those where ‘the right thing’ is in fact the same thing in all contexts.  But I digress.</digression>

Given the simplicity of the example, it is easy to answer yes, since it will take all of 20 seconds to write it.  But from a general perspective, you should really apply YAGNI.

TDD tends to lead you to think in terms of 100% code coverage and testing all possible edge cases and what not, and this is why it generally sucks.  You shouldn’t be thinking in those terms.

If you think instead in terms of how you application behaves and the scenarios the users will encounter.  Is it possible for the application to pass in a negative quantity?  If it isn’t, then you don’t need to test for it.

Now, of course, the specific simplistic example is besides the point.  You should always be thinking in terms of cost effectiveness and risk.  What does your application do?  What is the cost of trying to achieve 100% code coverage?  What is the risk involved in not testing edge cases?  For some applications, I think it is perfectly acceptable to create specifications that test your application as you are building it and then ‘throwing them away’, taking away the safety blanket, so to speak (and what I really mean here is not actively maintaining them).  When dealing with applications in a financial institution where potentially millions of dollars could be at risk, maybe not so much.

Don’t practice test-driven development, even if you test and test aggressively.  Your development should be driven by business needs instead.

caveat: if you are building a framework, you really should be thinking in terms of 100% code coverage because your ‘business needs’ will drive you there.  If you are offering a public API, then you should test all the possible ways it could be used.  This is different from when you control the application, where you control how your APIs are used.

posted @ Friday, April 22, 2011 12:11 PM | Feedback (0)
Tekpub not up to snuff?

As described here and here, it looks like BFF Rob Conery is having problems with Tekpub because of the Amazon outage.

For the people who have paid money to use Tekpub, I hope that he’s able to get his house in order.

When you place your bets on someone who, by his own admittance, hasn’t done real world work for years, you get what you pay for.  Perhaps this outage, which any experienced person would have anticipated, will prompt Rob to reconsider his self-opinion as someone above criticism and spend a little less time blogging about whatever shiny bauble that has his attention and more time on making sure his business can support obvious contingencies.

Having experienced these sort of outages before, I know how much it sucks.  Hope he clears that up.

posted @ Thursday, April 21, 2011 8:12 PM | Feedback (8)
Tip to paying taxes electronically

If you switch from paying taxes ‘manually’ to electronic payment through your accountant, take the extra minute to verify that they are going to try to pay it out of the right account.

Merde.

posted @ Monday, April 18, 2011 11:06 PM | Feedback (0)
Email Provider Changed

Because I’m old and slow and lazy and stupid, I didn’t bother to change the blog config to reflect the fact that I had changed email providers, so I’ve been dropping messages.  Well done, sir, well done.

Fixed.

posted @ Sunday, April 17, 2011 12:18 AM | Feedback (0)
Road Trip – Briefest of summaries

Length: 16 days.

Miles travelled: 6074.

Cities visited (well, where I had advanced hotel reservations anyway): Las Vegas, Los Angeles, San Jose, Seattle, Vancouver, Edmonton, Calgary.

NHL arenas visited: LA, Anaheim, San Jose, Vancouver, Edmonton, Calgary (Also took in a Lakers game and a Dodgers game).

Number of months it is going to take me to pay it off: too many (I luckily didn’t budget it carefully ahead of time, otherwise I probably would have realized it was too expensive).

I may put up a separate post with some of the ‘nature pics’ (and no, that doesn’t include Vegas, get your mind out of the gutter), as there is a lot of freakin beauty out there.

Minor tip: if you are going to do any travel that requires you to use a passport, and you just got a new/replacement one, remember to sign it, otherwise it’s a minor hassle trying to get back into the states.

Back to reality, and trying to clear out this massive backlog of things I have on my list(s) of things to do.

posted @ Tuesday, April 12, 2011 5:51 PM | Feedback (0)