There are many small ways that data analytics can lead decision making astray.

It’s easy to enjoy and celebrate stories of analytics success. But when analytics leads us astray, the results can be ugly and the stories much less fun to relate. We’re left scratching our heads and wondering how things could have gone so, so wrong.

Unfortunately, we have some notable examples of when data either led people to believe in something that was not true or, conversely, led people to doubt something that was true. In the mid 1980s, The Coca-Cola Co. did extensive market research using focus groups and surveys before deciding to reformulate Coke, its main product for more than a century. But the survey data showed a strong market preference for “New Coke” that did not, in reality, exist. The product launch was a bust, and “Classic Coke” — the original formula — returned within a few months’ time (while “New Coke” was discontinued in 2002). More recently, Volkswagen was found to have intentionally skewed emissions data generated during testing of its cars with diesel engines, obscuring the presence of pollution that did, in fact, exist in real-world driving situations.

While these big stories get attention, I suspect there are many examples of small stories about misleading data throughout lots of organizations. Using a sports-management analogy, the movements toward “small ball” are based on the idea that a series of small changes can add up to wins, with data analysis supporting each small decision. But this cuts both ways. If you believe that doing lots of small things correctly can add up to success, then doing lots of small things incorrectly can lead to failure. A number of small deceits by data can add up.

What’s worse is that managers’ work often doesn’t have clear endpoints, like sports games or elections do. As a result, based on data, managers may mistakenly continue to invest resources in activities that they shouldn’t. For example:

  • One machine is taken out of service for maintenance unnecessarily while a seemingly functional machine breaks down. While a breakdown is observable, unnecessary downtime is hard to assess.
  • Marketing is allocated that targets uninterested people and leaves potentially interested buyers unaware of a choice. Precise attribution of marketing results is notoriously difficult.
  • Misallocated costs kill profitable projects when lamprey projects that prey on company resources survive to claim another victim. Opportunity cost and the timing of project investments can make it difficult to recover in the market.
  • A job isn’t offered to a qualified candidate while a better-scoring candidate leaves a trail of poor decisions and subpar results. It may take considerable time to recover when only one option can be selected at a time and managers are not able to switch quickly.

Data may be deceitful, but managers don’t have to be deceived. Managerial decisions about the best investment of resources are an optimization problem and, as such, robust management decisions benefit from the same types of explicit sensitivity analysis.

Unfortunately, there are many ways to get analytics wrong. How sensitive are your managerial decisions to errors in the following?

Data: As costs of generating have dropped, organizations are accumulating more data. But gathering additional data from multiple sources may just increase volume and offer the illusion of more evidence, when instead it actually reinforces systemic error rather than enriching the story. Data may become less reliable as incorrect data becomes increasingly abundant and convenient. Ask yourself: What could be wrong with the raw data?

Models: Analytics models abstract the real world. This abstraction intentionally moves analytics results away from reality in order to support higher-level decision making. But unintentional omissions or weak models can unnecessarily widen the divide from reality. This is a place that managerial insight and knowledge of the world can blend with the analytical analysis to support decisions that would be better than either alone would produce. Ask yourself: Where could the models be strengthened?

Interpretation: Even if it were possible to obtain perfectly produced data and models, they would still be need to be consumed by managers — and consumption of analytical results is difficult. Everyone has biases, such as preconceived ideas or assumptions about the way things work. More data can mistakenly give us more confidence in what we already believe because we are inclined to accept what we already believe but scrutinize opposing idea. We interpret results through our hopes, seeing what we want to see in results. Ask yourself: What would someone with the opposite bias see in the results?

Even with diligence, managers may still be duped. A useful concept managers might consider borrowing from the domain of security is the idea of defense-in-depth: that is, thinking through the consequences of the failure of one line of defense. For managerial decisions based on data, an additional layer of security may be to recognize that data may still deceive. What are the consequences of mistakes (both in costs and time) if the data is weak or unreliable? What hedging could reduce extreme, unacceptable, or negative consequences? It’s worth the effort to think these questions through; enduring the effects of, and recovering from, data deceit may be a long, painful process.

1 Comment On: Duped by Data

  • Praveen Kambhampati | November 30, 2016

    Just like the outliers can skew the entire data set, a little tweaking of the raw data can bias the dataset into a wrong prediction and misrepresentation. The boundaries and limits for the data representation has to be very explicit. For example, we are all so familiar to the Market research of Nielsens’ and the likes who are still based more on manual survey and limited population samples which are extrapolated for larger representations. The larger picture however is devoid of the intricate and necessary local details that authenticate the prediction. We know that the social media data has the capability to reach with more efficacy. However the geotechnical and time to build data could be a constraint in building the datasets from the unstructured dump of Raw data. When done this can can instantly show the impact of local variations in real time, increasing the reliability enormously. Arguably, the market research firms are successful to a smaller percentage to gain access to the larger data enabled facts. There is also a business acumen that is a way doesn’t care for the data dependency, and rightly so to an extent. This acumen when enabled with reliable algorithms for prediction can give amazing business results for the organisation deploying the prediction modelling. Algorithms alone can take the organisation southwards for a blind dependency on abstracted facts.

Add a comment