Who Really Owns That Config File?

Posted September 12, 2022

I described a semi-hypothetical situation where an ops team solves a problem for a dev team where the end result is over-engineered.

My example wasn’t entirely off-the-cuff. I’ve’ seen teams over-engineer a solution due to each team not understanding the capabilities or constraints of the other side.

The pathology I’m interested in looks like this:

Something about an application’s runtime breaks.
Devs lack knowledge of the runtime.
Ops lack knowledge of the app.
Ops offer solution using what they know.
Devs accept solution, hack the code.
Devs later complain about solution.

Sound familiar? You would think everyone was happy with the result but in effect, no one was happy.

This scenario doesn’t imply teams don’t care about one another. And that makes it harder for managers to see the real problem.

An effective manager needs to understand the limits and capabilities of a team and manage to those limits. This means understanding how a team approaches (or avoids) a collaboration.¹

To take the example above:

How well does the dev team know their app’s configuration?
What source of information did the devs use to define the problem?²
Did the dev team present enough context to the ops team?
Did the ops team seek clarification from the dev team?

It helps to understand how each team arrived at its part. How did they center on the problem? What was the process they followed to get there?

I write this from a manager’s perspective, but these are also questions for technical leads to asking as well.

Improving the Dev / Ops Relationship

Practically speaking, when problems do occur you’ll hear from the dev team. There may be no silver bullet, but there are some ideas to try.

Define the handoff

What are the (concrete, non-verbal) artifacts one team produces for the other? If configuration needs to be updated, does the ops team submit a PR to the dev team? Does the answer come with a verification step? Who is on the hook for the changes?

This works well for teams that haven’t collaborated before or teams with very different technical backgrounds.

Give the less capable team a more detailed plan

One team may need more precise direction in what they should provide to the other team.

For example, if the dev team just inherited an unfamiliar codebase, ask them to research and present extra context to ops beforehand.

If the driving force behind the less capable team is that they are strapped for time, ask them for ways in which the other team could make their problems more easily solvable (aka “Make New Problems Into Old Problems”).

The common thread is to enrich the context and provide opportunities for clarification.

Nobody wants to feel incapable³. But if a team is struggling to define or refine the problem, give that team sense of what it means to look capable in front of the other. They will feel more productive, learn more in the process, and everyone gets a better result.

Draw ownership boundaries

I would not start here, but I also wouldn’t write this option off entirely.

The idea is to determine who owns what parts of the problem space. Should the ops team own production configuration altogether? What access does the dev team have to inspect and triage environment issues?

These are questions that both teams should define between themselves. As a manager, your job is to help them make clear boundaries.

Collective Ownership of Configuration

Just because something is YAML doesn’t mean it’s owned by devs. Just because something is an NGINX rule doesn’t mean ops shouldn’t treat it like code.

The concept of ‘collective ownership’ makes sense for an application’s codebase but usually not for its configuration. Here are a few reasons why:

You can’t make static assertions about the app’s runtime⁴.
Not everyone has equal access to modify things at runtime.
Specialized knowledge drives certain decisions (e.g., network topology, DNS)
Developers have configuration needs that go beyond operations.

An application’s configuration serves two very orthogonal purposes: feature enablement, and mapping its services to a specific runtime.

Unless dev & ops are highly integrated, there should be some shared ownership, but “collective ownership” is normally not it.

Take Everyone Out to Lunch

If cross-pollination is not possible, it’s important to find other ways for developers and ops to understand the pressures of their jobs better. Even if it’s not possible to take everyone out to lunch, look for other ways (informal or formal) they can understand the pressures or constraints of their respective areas.

Being clear about a team’s capabilities does not imply lowering the expectations of the team, but it may mean limiting the type of problems the team should tackle. ↩︎
Does the team’s understanding revolve around a single individual? This could be a problem if the individual (as senior as they may be) isn’t part of the collaboration. ↩︎
Human nature being what it is, emotions matter. Nobody likes feeling helpless and entering a collaboration with nothing to contribute. ↩︎
Examples are plentiful: connection pool sizes, memory constraints, network latency, key rotation, backups, etc. ↩︎

September 12, 2022

Tags:

management