Software Development Troubleshooting Rule #1: Read the damn error message

I did it again.  I know better.

Keeping in mind that my ego can be seen from space, one thing I am proud of is the fact that I am as good at troubleshooting software development errors, either in production or while doing development, as anyone I know.  I have met a few people who are my equal, but I’ve never met anyone or read of anyone who I would say was better (and I can easily say that of people in terms of coding skill, SQL skills, networking skills, and so on and so on and so on).

Yet I did it again.  I just spent a couple of hours struggling to figure out why I kept getting an exception in my code, and I did that because I violated rule #1: read the damn error message.

Don’t just think you’ve read the error message.  Don’t skim it, and think it says something other than what it says.  Absolutely don’t assume that since you got an error message in an area of code that produced a previous error at some point, it must be the same error with the same resolution.  Read the damn message.

Sometimes error messages are very obscure (COM error messages, I mean you), sometimes they might be misleading, but more often than not, the error message will tell you exactly what you did wrong, you only need to read it and pay attention to what it is telling you.

Even after you read the error message, you need to have skill in interpreting it at times.  I can’t readily explain what this skill is or how to develop it, not exactly.  I think my Philosophy background helps a lot, as a common mistake people make when reading an error message is to leap to conclusions about all sorts of possible causes, instead of just focusing on the very little that you know (the content of the error message) and working methodically from there.  Even then, there is a bit of an art to it that’s hard to articulate.

Back in the day, I was asked by a co-worker who had taken over my previous position as head of operations for help in troubleshooting an error (I don’t remember exactly what it involved, Microsoft Exchange Server, I think).  He worked very diligently at working through problems without asking for assistance, in part because he was replacing the guy (me) who had always been the ‘hero’ in solving operational issues and so (rightly) wanted to prove himself, but after 12-24 hours, he was stymied. 

We were aided by the fact that a direct (but obscure) error message was being logged in Event Viewer.  As I have mentioned many times before, Google is often our guide, so I googled the message.  I quickly determined that the 4th listing in the search results looked promising and, within 10 minutes or so, the problem was solved.

My co-worker came to me later and asked me how I was able to determine that it was the 4th listing that was the right one (and had quickly dismissed the first three listings), as he’d also used Google in a similar fashion.  I couldn’t really give him an answer, since I couldn’t really explain it.  It just seemed like the right one to investigate.

Nevertheless, no matter your skill level or experience level, unless you are lucky, you are going to waste a lot of time if you don’t follow rule #1: read the damn error message.

posted on Saturday, May 28, 2011 11:36 PM Print
No comments posted yet.

Post Comment

Title *
Name *
Comment *  
Please add 3 and 7 and type the answer here: