Big Idea: Competing With Data & Analytics

Analytical Value From Data That Cries Wolf

Imperfect data can still be put to good uses.

Sam Ransbotham September 28, 2014 Reading Time: 4 min

Topics

Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.

Understand the trade offs between false positives and false negatives.

Ideally, data contain neither false positives nor false negatives. In reality, data sources balance the two; decreasing one usually means increasing the other. For example, consider news. A newspaper can expend considerable effort to validate information from multiple independent sources before publication. To avoid unnecessary publication cost and reputation damage, the design of the process emphasizes publishing articles that are accurate. However, this accuracy comes at the expense of false negatives. By working so hard to avoid publishing mistakes, a newspaper may miss or delay real stories. As a data source, newspapers avoid false positives, but in doing so they sometimes are left with false negatives.

In contrast, Twitter users rarely miss anything — for example, the raid on Bin Laden was live tweeted long before mainstream news covered the story. Yet inaccurate (and often completely unfounded) rumors run rampant as well — many celebrities must have feline ancestry to bounce back from their many Twitter-based death reports. With low production cost and minimal reputational concerns, this data source offers insights and speed unavailable in the opposite approach. But the tradeoff is an increase in false positives.

Identify the sensitivity and specificity of each data source.

Organizations need to assess both the rate of true positives (sensitivity) and true negatives (specificity) for each data source to understand inherent tradeoffs. Does this data source lean towards false negatives or false positives? For example, data from hotel reservations guaranteed by advance, nonrefundable payment will likely have few false positives; this data source will have few reservations that go unused, but it may underestimate actual demand. Alternatively, systems that offer no-cost reservations are more likely to reflect hotel stays that do not actually happen. Models to estimate demand based on each data source would need to be different depending on the data source. Demand estimates based only on website searches for availability would differ even more. Potential tradeoffs cannot be incorporated into analysis until identified.

Propagate underlying uncertainty throughout the analysis.

Once source tradeoffs are identified, analytics results should reflect this underlying uncertainty. Disciplines such as chemistry and physics have rich histories of error analysis, with guidelines for aggregating error and propagating significant digits. However, in organizations, estimates may be presented as simple numbers that, in aggregate, obscure inherent bias that results from the data sources. Uncertainty in a result is neither indecisive nor indicative of a problem; the only real problem is thinking that there is certainty when there isn’t. As analytical thinking increasingly spreads throughout organizations, so too should comfort with analyses that incorporate uncertainty.

Seek variety for a portfolio approach.

If all the data in your analysis is from data sources that traditionally focus on reducing false positives, a data source that allows for false negatives adds insight. Choosing one or the other is a false dichotomy. In the hotel reservation example, the estimates of demand that combine prepaid reservations, no-commitment booking, and online availability search will outperform estimates based on only one source because each of the three data sources offers unique perspectives that, when combined, yield better insight to the true demand. Restaurants are embracing this approach by combining manual reservations from loyal customers, data from Open Table, and managerial insight.

In many contexts, predictive analytics is embracing data sources with a variety of perspectives. Uber blends multiple uncertain inputs to predict rider destinations and improve service. Looking at desserts on the menu doesn’t always lead to an ice cream order, but analytics at McDonalds incorporates the probability that it will. Airplane arrivals may not always translate to rental car demand, but Enterprise uses this signal to adjust staff and minimize wait. Even data rich in false positives can inform.

Know where the uncertainty lies when making decisions.

Previously, when information was more difficult and expensive to produce, organizations had experience to guide them in how to incorporate it into managerial decisions. Now, organizations must adapt to an environment of cheap and fast dissemination of dubious information. Yes, ideally organizations should work to minimize both false positives and false negatives. However, rather than ignoring data sources with false positives, analytics can use them to create value. The key is understanding the data; simple examples such as the Deming Funnel experiment illustrate that managing without understanding data is worse than doing nothing.

Topics

Competing With Data & Analytics

About the Author

Sam Ransbotham is an associate professor of information systems at the Carroll School of Management at Boston College and the MIT Sloan Management Review Guest Editor for the Data and Analytics Big Idea Initiative. He can be reached at sam.ransbotham@bc.edu and on Twitter at @ransbotham.

Analytical Value From Data That Cries Wolf

Imperfect data can still be put to good uses.

Topics

Competing With Data & Analytics

Understand the trade offs between false positives and false negatives.

Identify the sensitivity and specificity of each data source.

Propagate underlying uncertainty throughout the analysis.

Seek variety for a portfolio approach.

Know where the uncertainty lies when making decisions.

Topics

Competing With Data & Analytics

About the Author

Add a comment Cancel reply

Comments (3)

HR Harvard

Richard Ordowich

Keith Drummond

Topics

Competing With Data & Analytics

Understand the trade offs between false positives and false negatives.

Identify the sensitivity and specificity of each data source.

Propagate underlying uncertainty throughout the analysis.

Seek variety for a portfolio approach.

Know where the uncertainty lies when making decisions.

Topics

Competing With Data & Analytics

About the Author

More Like This

Add a comment Cancel reply

Comments (3)

HR Harvard

Richard Ordowich

Keith Drummond