If you can make a coffee you can fix a server

AddThis Social Bookmark Button

In any IT department the only piece of equipment more valuable than the server infrastructure is the office coffee machine. When the servers fail engineers rush around in a panic, trying to find the guy who "knows what to do" and providing a general "we are looking into it" response to staff desperate to send that important email or complete that presentation for today's big meeting. This continues until someone can finally resolve the problem. Yet when the coffee machine fails, the department does not descend into the caffeine deprived anarchy we would expect instead it either gets promptly fixed or the staff resort to that archaic alternative, the kettle.

Considering that in this post Beyond-2000 society the coffee machine probably has more choice than the barista across the street, more computing power than a server in the early 1990s and more technology than your mobile phone in the year 2000 why is there such a difference in the approach and reaction to an problem?

 

I believe the difference is not in the problem itself but in the approach to the solution. We expect a server to be more complicated, we expect it to do lots of things and we expect it to be a serious problem. The coffee machine we believe we know how to operate and we believe it can't be that hard to fix and we are already considering alternatives. So what would happen if we took our philosophy for fixing the coffee machine and applied it to our servers?

I have just pushed the button to start process on my Skim Espresso Macchiato (not really, personally I am a full cream latte) and seconds later with the flash of a few lights and some unsavory noises I realise that this is going to be a long morning. Before I put my dark glasses back on and return to my cubicle I check the basics; Has anyone filled the water tank this week? Did I remember to put in the capsule? How do I connect the milk thingy? Check, check, check...drat.

Concluding that I haven't been the primary cause of this unexpected inconvenience I look for the lights on the machine (flashing as they are) and squint to read the lcd display indicating that there is a problem with the coffee flow. Maybe I was too quick to jump to that conclusion, before anyone sees me I attempt to fish out my capsule (with the aid of a stirrer) and when that doesn't have any impact I switch it off and on again.

As the machine is warming up I consider my options; Maybe something else could be blocking the flow, I wonder if the filter has been cleaned since christmas and I can't find the manual but the receptionist who arrives in 20 minutes has the details for the serviceman. I have already found the kettle and a few sachets of instant coffee hidden in the back of a drawer. My backup strategy is in place. I am already feeling more perked up.

Fortunately a few short minutes later I am bathed in the glow of green lights and every printer repairers favorite 5 letter word, Ready. Fundamentally this process highlights everything that a good IT troubleshooter would do. So let's break it down into it's pieces.

First, we realized there was a problem and we checked what input the machine was expecting to make sure a) it was there b) it was right. I am pretty sure if I put the milk in the water reservoir and the water in the milk feeder I would end up with a frothy curdled substance that wouldn't much resemble coffee. I know that I insert coffee, water and milk and the machine outputs boiling water with coffee and then adds the milk. Inputs and Outputs, that is all I needed to know.

Secondly, having identified no problem with the inputs I began checking for errors or alerts indicating a problem. Blinding me at this early hour were a number of flashing lights and a small message display. I am sure with a google search or a copy of the manual I could have understood these better but I didn't need to.The message clearly indicated my coffee was not flowing so step one is to check if any of these inputs could be at fault. In my early morning daze inserting the capsule incorrectly was a highly likely cause. Removing the capsule was my first plan, I hunted for the required tools and made it happen.

Thirdly, I considered my alternatives and had identified if this didn't work I would read up on that error message, wait for the serviceman or use the kettle.

Finally, having made a change I performed a test by rebooting the machine to see if the problem was resolved. Fortunately it was. If it wasn't? Well I could simply have gone back to my second step and looked for the new error message, taken any of the other actions to identify the problem or if I felt like I could stomach the instant coffee switched to the alternative.

All of these actions are valid and ultimately they all gave me what I expected, a cup of coffee, and I performed them all in my half asleep daze. So why can't we apply the same approach to a server? We should and that is the fundamental basis of good troubleshooting.

Let's review that process; Check the basics, look for errors, consider the alternatives, perform some research to come up with a plan, put it into action and test it out. If it doesn't work? Repeat the process until a suitable solution is found. So next time you going looking for the fix-it guy in your IT department consider this process and maybe you can resolve it yourself, besides he's gone for a coffee.