How Will This Not Become a Total Disaster?

Posted September 30, 2022

This 2007 HBR article differentiates pre-mortems from other forms of contingency planning like so:

Unlike a typical critiquing session, in which project team members are asked what might go wrong, the premortem operates on the assumption that the “patient” has died, and so asks what did go wrong. The team members’ task is to generate plausible reasons for the project’s failure.

The “pre-mortem” is a relatively recent invention on software teams.

My first experience with pre-mortems was several years ago. A team ended a grueling series of sprints and the product manager concluded a meeting by proposing a pre-mortem the week before launch.

Questions were raised for the first time on the team including:

“What if the APIs can’t handle the load?”
“What if we see browser compatibility issues?”
“What if our customers hate [the feature]?”
“What if the A/B results are inconclusive?”

Asking a software engineer how a feature might fail is like asking a car mechanic what could go wrong with a car. You’ll learn something new, but be less sure what to do with the information.

Some things get left out

While the worst scenarios in the pre-mortem did not come to pass, the rollout was not an overwhelming success either.

In hindsight, I believe the product manager used the pre-mortem as a way to get buy-in from the team. To “clear the air”. To the extent it worked, its effect was short-lived. I checked in with the team a few months following the launch and I don’t recall them conducting any other pre-mortems after that one. The meeting did not become part of the team’s routine.

I still believe there is value in the pre-mortem, but many teams employ it as a perfunctory step, as something to point to as evidence of a “fail fast” culture but lacking real conviction.

Since that first experience, I have seen (and conducted) other pre-mortems where the assumption was that the engineering team can only point out failure modes related to technical issues. And when those technical issues failed to manifest, nobody seemed very sure how to handle the situation. That conceiving of a failure mode involving customers not using the feature could be reduced to a set of fixes or enhancements that could be divided and conquered by the team, if only they could be distilled and identified.

A hypothetical example

A hypothetical scenario: during a pre-mortem, a developer says, “I think this project may fail if legal has issues with how we are using customer data.” Suppose the customer data referred to is restricted PII, yet is at the same time foundational to the project’s success. Suppose no one ever asked the legal team at the outset of the six month oroject, let alone two weeks before launch. A diligent product owner may follow up with the lawyers and convince them that the initial rollout is so tiny (1% of users) that the risk is small– and only if the rollout expands further than some agreed upon percentage ofnusers would exposure become an issue.

“This launch may have bigger issues, in which case the privacy concern is moot, but if we’re successful, I promise we will re-evaluate and handle customer data better,” says the product owner.

This response, while rational-sounding, rings hollow to the engineering team. It sounds like Product has taken a valid concern from the pre-mortem and projected it to one of two possible outcomes: not a big deal, or something that can be solved. The “So what?” becomes moot. Effectively, the answer becomes “I hear you, but lets wait for the post-mortem.”

Can pre-mortems be ‘managed’?

In the hypothetical example above, there is another, more hidden, issue: the team may not have seen failure modes at the outset but has grown uneasy about how the project unfolded using customer data. This is only partially about PII. The “patient,” in this case, may not be the product or feature in question as much as the team’s morale. The same team will have to support the result of these decisions.

Going back to the HBR article:

Indeed, the premortem doesn’t just help teams to identify potential problems early on. It also reduces the kind of damn-the-torpedoes attitude often assumed by people who are overinvested in a project.

The article above doesn’t get into specifics here other than to tell you to incorporate what you’ve learned into your next steps. How, exactly, are you expected to do that? And when? It doesn’t say.

And that is my biggest problem with how pre-mortems are run.

Pre-mortems are about tackling risks, not failure modes

At the start of most projects, so little is known about the journey forward that it is tough to project all possible failure modes. Too late in the project (as our hypothetical example illustrated), it becomes hard to change course.

Asking someone how a piece of software will fail is an empty exercise unless their answers describe risks in a way you can generate new opportunities from those risks and weigh them accordingly.

In a previous post, I proposed that every risk is an opportunity. And so, identifying how the “patient” may die in a pre-mortem is meaningless without qualifying the risk.

Ask a few simple questions about each potential failure mode.

What variables are under our control to minimize this outcome?
How much time do we have to do something about it?

Looking back at my real-life example, considering the team had a pre-mortem the week before launch, the immense time pressure on the team introduced a huge time constraint. There may have been time to update the copy on the marketing pages, but not enough to rewrite or update parts of the backend.

Structuring a pre-mortem

In a well-run pre-mortem, you know enough about what you set out to build and how you will make it.

Risks with highly-controlled variables get mitigation plans (e.g., “let’s carve out the week post-launch for support so that any bugs get fixed right away”).

Different tactics may be needed for risks with fewer variables under one’s control (e.g., “our vendor’s APIs may not support the load” becomes “let’s engage with our outside vendor to stress-test the integration off-hours”).

Pre-mortems need structure to be practical, and brainstorming possible failure modes alone is not an effective means to identify and catalog risks.

It can also backfire by having the team swarm on every failure mode regardless of priority.

September 30, 2022

Tags: