Organizations that fail to heed their vulnerabilities are more likely to encounter catastrophes.
In October 2015, an American AC-130 gunship rained devastating fire on a Doctors Without Borders hospital (Médecins Sans Frontières or “MSF”), killing 42. Last month, the Pentagon released a report detailing the chain of events that led to the catastrophic mistake. Immediately, debate began as to whether punishments meted out to 16 service members, including potentially career-ending suspensions, reprimands, and removals from command, were insufficient when compared with taking a route of criminal inquiry, or if, such tragedy has to be accepted as an inevitable consequence of the fog of war.
Debates about assigning responsibility and meting out punishment are expected consequences of any such event and its follow up, but they rarely address the bigger issue: how to prevent such catastrophes from happening in the first place. Indeed, when we focus primarily on punishment, without addressing the precipitating factors — both technological and organizational — we are inadvertently exposing ourselves to risk of recurrence.
In the moment, actors in chaos or crisis are overwhelmed, facing conditions that are, at worst, out of control and that, at best, present them with awful dilemmas, for which no choice or course of action is a satisfying one. That said, many catastrophes might have been avoided — but it would have required action far removed from the suffering in time and place, and that action would have had to focus on engagement and energetic problem solving, and not on discipline per se, either in the form of blind compliance with procedure or penalties for misbehavior. Prevention requires a considerably different set of organizational structures and dynamics than punishment, ones that may not come nearly as naturally to most leaders.
We’ll get to some specifics, but first some context. I’ve had the chance to work with many members of the uniformed military over several years. From what I’ve learned from the service members whom I’ve gotten to know, the delegation of authority to them — by the American people through the President and the chain of command — to use violence in pursuit of national interests is something taken seriously. Planning, training, drilling, and attention in the moment of action are aimed on ensuring that delegated authority is effected legally and effectively. The possibility of unleashing the destructive power of an AC-130 or any other weapon is treated with a weight and conscientiousness that few of us see or express in our more ordinary undertakings. In that, we should be proud, impressed, and reassured by the women and men who accept such responsibility with the risks attendant to it.
How, then, can such an event occur, and how can it be prevented? In my MIT classes, we examine a series of catastrophic failures — large-scale, public ones like the losses of the space shuttles Challenger and Columbia, the BP oil spill in the Gulf of Mexico, even the series of events leading up to 9/11, and smaller-scale ones like “medical misadventures” in which harm is inadvertently done to patients by well-trained and well-intentioned medical staff. The circumstances differ, but the pattern is always the same: Small, seemingly inconsequential factors showed themselves to be out of control well in advance of and far removed from the crisis moment. These could have been addressed, and often with minimal cost or disruption. But, because they seemed to have no immediate consequence at the time and place of their happening, and it would have required great imaginative leaps to construct a chain of causality towards the actual ill outcome, the so-called deviances got normalized (for example, see Diane Vaughn’s book, The Challenger Launch Decision) and were treated as routine experiences that required no abnormal attention of containment or correction.
In class, we watch a video dramatization of a hospital failure, first looking at the moments of chaos during a rushed, emergency surgical procedure. No one is listening to anyone, teamwork has broken down, and everyone seems malfeasant or incompetent in some fashion. The students’ reaction is typically emotional and retributive. “That guy’s a tyrant” or “She’s incompetent!” But, as we dig in, the students realize that in the moment, these people are actually doing the best they can; it’s just that they’ve found themselves in situations where it is impossible to succeed no matter what they do.
So we crawl back in time to the precipitating events — interruptions, losses of information, ambiguities, and confusions that started popping up hours and days in advance of the denouement. There’s the appointment that cannot be scheduled, the medical record that cannot be located, the blind spot in the waiting room where the patient’s deteriorating condition is not recognized. Like the regularly cracked heat tiles in the space shuttle program, each individual element shared the common characteristic of being annoying while also being perceived as part of normal experience. Because each could have been addressed through some combination of fire fighting, coping mechanisms, and localized heroism, they weren’t solved to cause. It’s only with the benefit of hindsight that we can connect those individual precursors into a catastrophe-inducing combination.
What’s the solution? It begins with engaging front line staff and empowering them to call out micro failures before they have an appreciable impact. That must be paired with procedures for escalation so the underlying causes can be resolved. For instance, the first time a patient’s record is missing, the first time one medication is confused with another, the first time a transmitter doesn’t broadcast properly, the first time an instruction is confusing … find out why, and make adjustments so the glitch doesn’t recur.
In what feels counterintuitive, one thing you realize is that you often have to discourage in-the-moment heroism. It may look and feel productive, but it actually obscures the realization that the system has vulnerabilities that must be fixed.
How, then, does this perspective affect our interpretation of what happened in October 2015 over Kunduz? According to a summary of the Pentagon report in the New York Times, a collection of failures led to firing on the wrong target. Some factors included the the aircrew taking off in a rush without the “normal” briefing and list of protected sites; electronic communications equipment failures that prevented an update on the fly; the aircraft being forced off course and having trouble regaining its orientation once back on station; communication of precise coordinates being fuzzy, prompting the crew to acquire visual confirmation; and, finally, when final approval was requested, no one realized that what the flight crew was describing didn’t fit the target requested by troops on the ground.
It is highly unlikely that any one of these circumstances occurred for the first time on October 3, 2015. Rather, one would expect that other crews took flight quickly, other equipment had faulted out, other coordinates had been ambiguously coordinated, and so forth. But when those occurrences happened in isolation, the immediate consequence was minor. Each isolated event would have provoked scrambling in the moment by aviators and ground support, which enabled them to regain control, and, in doing so, cultivated a “here-we-go-again” attitude. Ironically, this kind of weary acceptance prompts intelligent, committed people to focus on containing problems in the moment, rather than triggering the actions necessary to prevent recurrence.
In effect, people were “walking by” or “dancing past” the micro breaks in the system, not recognizing that while the individual glitch’s potential to cause disruption was minimal, in combination it was extraordinary. So, the organization’s vulnerabilities weren’t fundamentally corrected. Eventually, and perhaps inevitably, they presented themselves in just the right combination to wreak havoc.
What’s not inevitable, however, is that situations ever need to develop toward such tragic ends. But making the transition can feel countercultural. Otherwise heroic people, adept at making do in trying circumstances, will have to learn to call out micro problems. Senior leaders, used to being reported to and accustomed to focusing on big issues, will have to learn to swarm the scene of little things, to model the inquisitive behavior needed to discover what about the ‘system’ (technical, human, or socio-technical) caused it to misbehave.
There are bona fide barriers to flipping culture and behavioral norms from compliance to prevention. At many organizations, leadership transitions happen so fast and frequently that it is difficult to build the kind of trust through the hierarchy required for people to feel comfortable calling out problems. Other organizations operate with blame-casting mechanisms — morbidity and mortality reporting sessions in health care, for example — that may overwhelm countervailing efforts to highlight error in a blame-free way.
There is really only one way to transition an organization toward embracing the kind of openness demanded by a culture of prevention and improvement: from the top down. It begins with a clear commitment from the executive ranks to investigate problems early, in a therapeutic fashion, rather than wait to investigate fewer, but bigger, problems later in a judgmental way and must be continually reinforced by leaders modeling the right behaviors and providing coaching throughout the organization.