Analytical Value From Data That Cries Wolf
Imperfect data can still be put to good uses.
Topics
Competing With Data & Analytics
Data can lie. Sometimes it tells you things that just aren’t true. A positive cancer screening, for example, induces panic until later investigation reveals the result was a false alarm — a false positive.
Like medical tests, data used as input for corporate analytics can contain false positives. And, like medical tests, these false positives can lead to incorrect analysis and unnecessary action. For organizations relying on analytics, accepting fiction as fact may be dangerous.
Yet, not all false positive are poisonous; despite the risk, many companies find ways to get value from data filled with false positives. Here are some guidelines for creating value from false positives.
Understand the trade offs between false positives and false negatives.
Ideally, data contain neither false positives nor false negatives. In reality, data sources balance the two; decreasing one usually means increasing the other. For example, consider news. A newspaper can expend considerable effort to validate information from multiple independent sources before publication. To avoid unnecessary publication cost and reputation damage, the design of the process emphasizes publishing articles that are accurate. However, this accuracy comes at the expense of false negatives. By working so hard to avoid publishing mistakes, a newspaper may miss or delay real stories. As a data source, newspapers avoid false positives, but in doing so they sometimes are left with false negatives.
In contrast, Twitter users rarely miss anything — for example, the raid on Bin Laden was live tweeted long before mainstream news covered the story. Yet inaccurate (and often completely unfounded) rumors run rampant as well — many celebrities must have feline ancestry to bounce back from their many Twitter-based death reports. With low production cost and minimal reputational concerns, this data source offers insights and speed unavailable in the opposite approach. But the tradeoff is an increase in false positives.
Identify the sensitivity and specificity of each data source.
Organizations need to assess both the rate of true positives (sensitivity) and true negatives (specificity) for each data source to understand inherent tradeoffs.
Comments (3)
HR Harvard
Richard Ordowich
Keith Drummond