Management Strangely, you do not run

When the Sky Falls

A few years ago I wrote a piece that romanticized the state of the sky falling. The article is not about fixing disasters, it’s about preventing them, but no matter how much you prepare, disasters happen.

The romance surrounding disasters is history speaking. When the disaster shows up and you see it, no one but you knows that you want to throw up. That’s your brain releasing a complex chemical cocktail that is physically and emotionally preparing you for the most sensible course of action – making a fucking run for it.

But, strangely, you do not run.

Having watched, participated in, and created a bevy of sky-falling situations in my career, I take the process I use for managing these situations for granted. It feels like I’m working on pure and spontaneous instinct, but these are honed instincts that I’ve built and refined over a great many DEFCON 1 disasters that I’ve had the unfortunate pleasure of attending.

This is my documentation of the process, and I sincerely hope you never have to deploy it, but I’m pretty sure you will. Before we start, a few assumptions and notes about this process:

  • This is not a solo disaster. It’s not just you, it involves a large group of people wrestling with a complex disaster , featuring different politics and motivations with an unknown root cause.
  • No matter how much I sit here and EXPLAIN IN ALL CAPS THAT THIS PROCESS WORKS, you’re going to skip at least one of the following four steps. That’s cool. Each section includes a handy “This is what happens when you skip this section” addendum so you clearly understand the magnitude of your omission.
  • I’m going to describe this process simply and serially, but in reality these steps overlap and often run in parallel.
  • Oh yeah: there’s a good chance this disaster is your fault.

Let’s begin.

STEP NUMBER ONE: The Situation in the War Room

Your first job is to understand absolutely everything you need to know about the current state of the disaster – you are developing a mental model. Ideally, I want to be able to draw a complete picture for everyone about whatever the hell is currently happening. I need white boards – lots of them. I need a War Room.

The War Room is a base of operations. The requirements are simple: enough room to hold a quorum of people, a table, chairs, and lots and lots of white boards and markers. In this room, you are going to begin the immediate task of research, information gathering, and assessment. Every single person – everyone – who has any relevant knowledge about the situation is now going to parade through the War Room and you’re going to capture and triangulate all of their knowledge.

The intent of the War Room is to break everyone from their flow. The War Room includes a menagerie of people coming in and out, empty pizza boxes and Red Bull cans, and white boards full of indecipherable scribbles. It sends a clear message: the status quo is not presently working.

As the people start streaming in for the inquisition, remember that this first step is data collection, not problem solving and not judgement. The hardest part of this step is not to jump when you think you see a place you can start moving. That! That! That! We need to fix that! It very well might be the right move to fix that, but you don’t even know how much “that” there is, yet.

The initial goal during this step is information acquisition, not action. Each time you take action with incomplete data you risk stoking rather than extinguishing the disaster fire. And, by the way, this type of lack-of-foresight hyper-reactive mode is likely what got you here in the first place. The War Room is a place to focus on gathering a breadth of information first, then depth, so you can answer the question: “Do we understand the situation?”

For me understanding comes in three forms:

  1. The picture I begin to draw repeatedly on the white board starts to represent a realistic picture of what has actually occurred.
  2. A list of additional research, work, and potential next actions is developed, revised, and iteratively prioritized (but not yet acted on).
  3. After much #1 and #2, I’m looking for a very precise moment. It always happens at a different time, but it is distinct. It is the moment I have the glimpse of a theory. This… is what we need to do.

What happens if you ignore this. A surprisingly number of smart people skip this step. They believe that they can both assess and solve the problem at the same time. This is akin to saying, “I will solve a math equation that I can only half see.” It’s absurd, but it’s precisely what someone does when they start making critical decisions with incomplete data.

Action feels like progress, but undirected action is not progress, nor is it a plan. You’re going to barge into the office and start barking orders because that is what everyone expects, but if your orders are not shaped by what you’re really attempting to do, you are just scurrying people around aimlessly. Yes, you get lucky. Yes, everyone breathes a sigh of relief when you show up with your impressive sense of purpose, but in my experience when my direction doesn’t map to intent, I’m usually getting no closer to propping up the sky.

STEP NUMBER TWO: The Bet Your Car Perspective

With your new found confidence that you’ve fully described the problem in front of you and have a semblance of a fix, my timely buzz kill is this: I am 100% absolutely certain that you’ve missed something essential in your first pass. There are two ways to discover this: you can jump straight into the next step and discover this absence at the most inconvenient and credibility-destroying moment, or you can check your math. To do this there are two phases:

#1 Vet your model with, at least, three qualified others. These are people who were not directly involved in Step #1 and who are people who don’t need to understand the particulars of the disaster, but can appreciate the broad strokes because they’ve been there, and have no issue with telling you how screwed you are.

The joy that occurs at the end of Step Number One is the discovery of a fix. This moment of illumination is gratifying because it’s the first time you believe there is a chance the sky can be propped up. Bad news: confidence is not a plan either. This situation is very common with software developers who are fixing bugs. They look at the bug description, write a few lines of code, rebuild, and, viola, it’s fixed, because when they reproduce the exact steps of the bug, the bug no longer occurs. They have no idea that this small change in the import export code will also have unintended side effects on other file operations.

Having a solution where all the implications of the solution are not understood is not a fix. You must take the time explore all the implications, and in my experience this takes longer than it took to come up with your plan.

#2 Once your qualified others have discovered those gaping holes and unintended consequences that are guaranteed to exist in your fix — your model — you need to throw the current version on the nearest whiteboard in the War Room with the folks who are responsible for the work that’s going to go down over the next few days and ask the same question: Does this picture, this list, make sense?

This is yet another error correction pass for gaping holes. It’s also a pass on prioritization, but most importantly, it’s an assignment of ownership.

One of the leading causes of sky falling situations is distributed ownership, and as a strategy distributed ownership seems very humane. We’re going to put the right people on the right problems. We’re going to empower the most qualified people to make their own decisions regarding their local problems because they have local knowledge and can make the best decisions using this knowledge. As a human being and a nerd who has an intense allergy to being told what to do, this model of distributed ownership appeals to me. I imagine small teams of bright people empowered because they feel they control their destiny.

A sky falling situation exists not because of a single failure on one team. It’s a collection of multiple large and small mistakes on many teams that snowballs into an unexpected worse case scenario. Teams of people succeed and fail at scale. A likely major contribution to your current disaster is the fact that multiple well meaning and fully informed people looked at an emergent disaster and thought, “Well, someone who is not me is going to handle this, right?”

Since you’re the person who is racing to work while panicking about the sky falling, I’m going to call you what we called these folks during my tenure at Apple: the Directly Responsible Individual or DRI. This name clearly describes the person who is directly responsible for whatever the situation might be and it’s a person. It’s not the Directly Responsible Group of People With Good Intentions Who Are Attempting to Feel Good by Building Consensus But Who Are Mostly Wasting Everyone’s Time. It’s an individual who is owning the entire situation.

However, as the DRI, the person who is most likely to be yelled at, your job is to be accountable. Your job is not to own all of the work, which is why the last part of this step is to put a proper name next to each and every task, and, as much as possible, this name should not be yours. When you’re done with this assignment and someone in the War Room asks, “Hey, why isn’t your name on the list?” Your answer is, “Because I’m the one making sure this whole thing is moving forward and I’m the one who gets fired if it doesn’t.”

What happens if you ignore this. There are a variety of skippable parts in Step Number Two, but I’m not worried that you don’t have an initial plan or that you’re incapable of pulling trusted others in to distribute the load. The part that has screwed me the most is failing to understand all of the implications of my theory.

A former boss used to put this into clear perspective, “Do you understand all of the implications of your plan?” Yes, I do. “Give me your car keys.” Wait, what…? “Would you bet your car on the viability of your plan?” <sfx: shaking keys> Right, yeah, let me do one more pass.

STEP NUMBER THREE: Constant and Consistent Sky Propping Pressure

When the sky is falling, everybody is watching. Everybody wants status an hour ago. Everybody is talking to everybody else about the state of your sky-falling situation, which means the Grapevine is actively working against you. The amount of fear, uncertainty, doubt, and outright lies generated about what’s actually going is impressive.

I’m assuming that you’ve got a credible plan that you’ve carefully vetted with others. I’m assuming you’ve assigned the work to competent folks who have a sense of ownership of their respective parts of the plan. I’m assuming the War Room is abuzz with the action defined by the plan. While all of this going on, your job is internal public relations.

As soon as I have something to report, I send the report to everyone who wants to know. If you walk by the War Room, poke your head in and ask, “What’s up guys?” I add you to the distribution list. You’re going to get every update until you beg to be removed. Anyone who mails me any random question — they’re on the list. The game here isn’t just over-communication and Grapevine eradication, I’m still worried I missed something in the plan, and the status spamming is another another means of vetting both the plan and the progress.

How often do I send status? It’s a judgement call based on progress relative to the beginning of the disaster. The better the legitimate progress, the fewer the updates. It moves slowly from hourly updates to daily updates and ultimately to weekly ones, which is when I start thinking about tearing down the the War Room.

What happens if you ignore this. It’s hard to imagine someone not regularly broadcasting clear, demonstrative, measurable, and consistent progress. Maybe because you’re still deep in research and don’t yet have a theory and you don’t want to call attention to that fact? Maybe there hasn’t been significant progress on anything since your last update? You still send status. The message you send by consistently keeping the folks who care up to date is not: “We’ve made unique progress or we have a theory,” the message is, “We are applying constant and consistent pressure on propping up the sky.”

The Elusive Step Number Zero

I’m reading an early draft of this piece and it still feels like there’s romanticism about this process. Look at me, Captain of the War Room, I’M GOING TO SAVE THE WORLD. There’s nothing romantic about this situation. There’s no glory in propping up the sky because, chances are, you and your team are partially responsible for this situation and depending on the severity of the disaster, there’s a good chance you could get fired. Even if you fix it.

There’s a fourth step to this process that I’ve confusingly labeled as Step Number Zero. I’ve put the first step last because I believe it’s the most important part of this process. I’ve put the first step last because if you’re able to confidently answer it, you’ll greatly increase the chances that you won’t repeat this disaster in the future. The question is: What, precisely, are you trying to do?

It seems like a dumb and obvious question at this very moment, but right now you’re chilling with your iPad in the coffee shop. You’ve just taken your third sip of that half-caf quad-shot latte and you don’t have a care in the world. If the sky was actually falling, you’d be racing to work, breaking speeding laws, and frantically thinking, How am I going to unfuck this situation?

Unfucking this situation is a sensible and obvious outcome, and while you’re driving 105 miles an hour down Highway 280 to the scene of the crime, I will repeat myself: What, precisely, are you trying to do?

It’s a hard question to effectively answer when people are yelling, but phenomenal answers sound like:

  • We need to demonstrate to this customer that we are capable of exceeding their expectations.
  • We need the people who depend on us to trust that their faith in us is not misplaced.
  • We need the Planet Earth to understand that we aren’t evil.

You will notice that none of these answers read “unfuck the situation”. When the sky is falling, like I’ve said before, immediate action feels like precisely the right course of action because HELLO THE SKY IS FALLING. But there is a well defined reason for this situation, and it’s likely you won’t know the reason for a while. It’s agonizing, but my advice is to not make any decisions on course of action until you have at least a credible answer to this question.

In the face of disaster, it’s the wise person who does not act until they know. Unfucking the situation is a bandaid, understanding what you’re truly trying to fix is a cure.

Leave a Reply

Your email address will not be published. Required fields are marked *

10 Responses

  1. Michael Toy 13 years ago

    In a former workplace, the reaction to sky-falling was really interesting. In this workplace, if the problem was important enough, every course of action must be taken. Why would you not try everything since this is an important problem? If you solve the problem by doing 15 things and don’t know really which one fixed it, that was considered a victory.

    I find this troubling, because I don’t agree with it, and because I suspect that my disagreement is wrong, and that the health of the company was actually better served by fixing the sky first.

    What would Rands do?

  2. Phillip Brooks 13 years ago

    My first thought reading this was closure in the “antennagate” lawsuit.

  3. Related, Allspaws’ “Outages, PostMortems, and Human Error” is the best Ops oriented take on these themes I’ve seen.

    In particular starting page 50 the “Crisis Pattern” graphs.

    http://www.slideshare.net/jallspaw/etsy-codeascraft-allspaw1

  4. Robert 13 years ago

    I learn something every time I read your posts. I’m on the verge of a ‘sky is falling’ situation right now. I know something is not quite right. We’re not at the “we’re fucked” stage yet (or maybe, we’ve just been in trouble so long we see it as the status quo.) The new e-mail group is starting now, and we’re going to tackle this issue. Thank you.

  5. @Michael Toy: I’m not exactly sure what your “find this troubling” refers to – if it’s referring to this article, and the idea that “try everything to fix the sky” is the wrong thing to do, the issue with “try everything” is pretty simple – lots and lots of things you can try will simply make things worse.

    If you try everything, and something worked, and you have no idea what, then you succeeded by pure luck and you have even less idea how to prevent the problem from occurring in future.

    Worse yet, you don’t even know whether anything you tried actually worked, or if the problem Just Got Better on its own (as unlikely as that is, it certainly can happen).

  6. Marshall 13 years ago

    I’ve always hated the “would you bet your XXX” method of ensuring confidence.

    Wagers are between two parties; each one puts up something of value.

    I had a boss once ask me “Would you bet your job on this plan?”

    My response was: “Against what? A year’s salary as a bonus when I deliver? Certainly! Do we have a bet?”

    That was the last time he used that on me. (and no – he wouldn’t take the bet)

  7. First thing I try to do in such a situation is to get a clear perspective, even before asking myself what I want to do.

    Sky-falling, disaster – it is dramatic, but just how dramatic is it? Are there any lives at stake? Could one or more companies get out of business? Or is it just a few jobs? Or might this just mean a busy few weeks?

    Based on this evaluation, how will I communicate? Is disaster and sky-falling the right set of words? I do not want to play it down and loose credibility, yet most situations are by no means as critical as they seem – and believe me, I have been there.

    Getting the right perspective helps me keep cool, even if lives are at stake. The only way to have a chance in the war room.

  8. There’s another article to be written that I think has been touched on here in bits and pieces about how this can be used for evil. That is, executives, managers, and workers understand this process, but believe that a crisis is the only way to encourage any systemic planning, clear assignment of priorities to individual people who are responsible, and communication about the status of a project.

    So you get a place where people do one of three things: allow foreseen disasters to come up, intentionally create disasters, or (and this is the worst) attempt to make crisis management the SOP of the organization. Which not only burns people out but also leaves the organization vulnerable to an actual disaster.

  9. I really appreciate this post. A recurring thought while reading this steals from Horace Dediu’s “what job was X hired to do” idea. The high level, obvious word to insert for X would be company name. Along with the “What, precisely, are you trying to do?” question mentioned, this additional question helps to reveal core themes or purpose, especially when inserting lower level examples for X, such as division, team, strategy, product or service, contract, etc.

    This question is simple enough that the response often seems obvious or safely assumed. But, it serves as a filter, exposing critical information or reminders.

  10. So you get a place where people do one of three things: allow foreseen disasters to come up, intentionally create disasters, or (and this is the worst) attempt to make crisis management the SOP of the organization. Which not only burns people out but also leaves the organization vulnerable to an actual disaster.