I’d like to tell a story (a mostly real one) that can help you think through how to make your DevOps transition a little smoother, level set some over-exuberance, and ensure everyone feels they are getting a fair shake in a way that is collaborative.
I had a customer whose teams talked endlessly about how they wanted to get to DevOps, continuous integration, and high velocity of deployments.
The challenge is that they talked about DevOps making Deployment going faster. They wanted rapid deployment, daily changes, and to push code to production every day. As a result everyone latched onto what they thought it meant. They talked about faster creation and deployment of new features. They talked about end outcomes and the excitement of reaching that end goal of daily pushes. Developers thought they had reached nirvana and could get all the code that was backlogged into production whenever they wanted it. Operations teams thought it meant that development would write cookbooks and test everything and they could focus on undoing technical debt, getting rid of crappy code, and making things work right in production.
Now these are all valid goals of DevOps. They all are things we want to strive for. But they were being framed in the legacy biases of Dev vs. Ops. As an example, someone who is typically production and operations focused could easily quickly admonish the developers for being “unrealistic” in their expectation to jump straight to daily releases and rapid increases in speed and velocity. You don’t jump straight from typing in code to putting it in production. At least not in reality, and certainly not with quality. While there is truth in these statements, any admonishment is going to be perceived by developers as blocker to the speed and velocity they want. And they’ll push back and say, “Its all Operations being the blocker and slowing us down!” And they’d be partially right.
So instead of admonishing the developers, we changed our language and effort to focus on one of the source of our issues – environmental stability. The development and QA environments were unstable, systems were undersized to run any meaningful tests or even run the programs run on production systems, and they did not have representative data to work with.
We started saying “we are going to give you stable Dev and test environments”, “we’re going to increase speed and accuracy of testing”, “we’re going to get you good test data that is as close to current and complete as possible in prod as possible”, “we’re going to give you any data you need to identify, debug, analyze and respond to test and prod failures”. This shifted the conversation from being adversarial (Devs pointing at Ops and saying their obstructionists) to being collaborative (ooooh, they’re going to give us shiny new toys!).
Ops focused on building a proper development and QA environment that could very accurately depict production. We first sized resources (hardware, networks) that could support the effort. This might seem “wasteful” since development doesn’t generate money, why don’t we go with left over systems. But the point I raised was that development was where the real work was taking place – where undersizing would be a mistake and lead to all the mistakes happening in production, where mistakes cost money. Lets instead make mistakes in an environment where it doesn’t cost the company money. This doesn’t mean that we spend exorbitantly, but that we shouldn’t be foolishly cheap. Development/QA was built in the way that the teams wanted to build production. It used the tools they wanted to use in production. And we ignored any further work on production. Yes, you heard that, we didn’t go after technical debt in production (unless it caused an outage). Why did we do that? Because there was no sense in fixing things that we didn’t know yet if those fixes were appropriate. We needed to test the entire infrastructure, not just the code, as a development effort. We needed to get code that was tested and optimized and architected the best way through prototyping. We needed to test building systems, deploying the operating system configured in the way the development teams needed it configured, installing databases, and doing anything else that was needed to give the developers the environment they would expect to deploy on top of. We needed to do this in an environment where we could make mistakes, learn from them, and correct them – all without impacting generation of revenue.
What we accomplished was a double win. We gave Developers the resources they needed to be productive. We gave them tools, stability, data, and capacity to experiment. We gave them testing tools…and the operations teams got to test right along side of them. They got to build the tools, build the stability, learn how to handle the data, and build the capacity. It was no longer pie in the sky but what each Dev team wanted and needed to go faster, and lessons on how Operations could clean up the technical debt in a way that mirrored the Developers’ intention. It was about how we could positively influence the lives of our Dev teams, and Ops teams.