October 28, 2013

Why Does DevOps Think They Are the TSA?

by Chris Waters

So you are the DevOps Manager — gatekeeper of everything good — and the go-to-guy for smart code running on big iron. You help keep engineering from releasing the miserable code that QA was not able to sniff out. You see the big data center picture and can techno-babble with the best architect around.

Your brain is evenly split between “risk management” and “time-to-market” and you know when to be where. You are the great protector of uptime. Your glance scares junior engineers back to their cubes, you are immune to false-positives, and you have more dead pagers than Blackberry has unwanted phones.

You are the DevOps Manager. But why are you acting like the TSA?

The intersection between engineering and operations is where the code hits the wild. It is where strategy and market leadership becomes a reality for software companies. And great Operations management is usually the difference between a mediocre customer experience and delight. While nobody would miss DevOps if it disappeared in the short-term, ensuring that sophisticated software is performing well (forever) is what superstar DevOps folks do in market leading companies.

Unfortunately, while listening to product and engineering groups discuss their challenges at my new company Aha! I have recently noticed a disturbing trend. DevOps groups are behaving badly and it it has led me to reflect on the mindset of some of the teams that I have worked with. Because DevOps teams are often blamed for poor releases, they have developed a debilitating first order assumption that drives everything they do. The new religion that has likely turned your Devops group and many others like it into the TSA of business is:

DevOps believes that every engineer is a bad player destined to release poor code that will hurt people.

This assumption leads DevOps to go terrible wrong. If you believe that everyone is a threat with malicious intent or is simply unaccountable for their actions, you erect gates and more gates and “predictive” tools to dissuade engineers from releasing code. And you set up contingency plans and defensive fallback procedures if your gates happen to let the bad guy through.

But these gates have the opposite effect — in every way, these safeguards lead to higher risk. They do not ensure higher quality releases and are guaranteed to lead to customer dissatisfaction for the following reasons.

More code The longer the team goes between releasing code, the more code is going to be released. It is that simple. Even if your team does not work all that quickly, they get paid to write code and some likely gets written every day. And the more code there is, the higher the likelihood that something is going to go wrong when it does get released.

Stale engineers Writing code takes real focus and attention to deal. This is why it is difficult to context-switch when writing code and the more dynamic the feature or difficult the bug, the harder it is to multi-task. If there are long delays between writing code and releasing it, there are start-up costs in terms of how long it will take a developer to get back into the mindset that she was in when she originally wrote the code. There are just too many dependencies and trade-offs that are carefully considered when writing code to keep them all in mind once you have moved on to the next challenge. So, when bad code does happen, it takes longer to figure out why and fix it.

Frustrated customers Do customers believe that your software will always be perfect? No. But do they expect you to pay attention to them and fix a problem quickly? You bet. So, if longer release cycles do not increase the quality of software (and actually increase the likelihood of problems) and customers understand that gremlins do happen, the goal should be to release more often and fix problems faster. When a problem does occur, the customer will be intent on helping you fix it, but the longer you go between releases to fix it, the less the customer will remember or be able to verify that it is resolved.

So, how should DevOps proceed in a world of software uncertainty?

Operations teams should stop trying to be the corporate TSA by “gating” engineers and product teams to a standstill. The goal of eliminating problems should be replaced with taking reasonable risks.

Rather than instilling fear and forcing costly efforts to plan for every contingency and ways to “back out,” they should look ahead just like the rest of the organization. Engineers want to deliver great code and delight customers and will go out of their way to take responsibility for issues if they have been empowered to fix them. At the end of the day, a business needs to keep moving forward and too many DevOps groups today are looking towards the past and disabling groups from building what matters.

If you are looking to create brilliant product strategy and roadmaps to release often with confidence, Aha! is for you. Sign up for a free trial to see why the top software and web companies are now using Aha! for product management and roadmapping.

I am the co-founder and CTO of Aha! and have worked with some terrific DevOps folks over the years. Special shout-outs to Ted and Bill for getting it right. Follow me and the company at @aha_io.