The Big Impact of Small Data Errors

Even when the data itself is solid, mistaken connections are sometimes made during analysis. But with vigilance, managers can avoid a data mishap.

Reading Time: 3 min 

Topics

Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.
More in this series

A 2008 news report led with reports of Russian tanks and troops surging into Georgia — accompanied by a map mistakenly depicting the invaded territory as Savannah, Georgia, rather than the homonymous Eastern European country. While the story was correct, Google News’ map selection was not.

Born and bred in Georgia as I am, any initial fears for my kinfolk were quickly dispelled when a cursory investigation revealed that the map of the country of Georgia would have been a better image, not the map of the U.S. state of the same name. This was a simple case of misidentifying the value “Georgia” when associating the news data with the map data.

But other cases of data misidentification are not so simple and carry greater consequences. To the chagrin of many people sharing names with others with more nefarious tendencies, a false match to the no-fly list can be quite an inconvenience. Images, for example, can be linked to people. Advances in image enhancement and processing are yielding growing prowess in facial recognition — and growing concerns about misidentification.

In a recent, painfully public episode, online vigilantes used the copious amounts of image data from the recent Charlottesville protests to identify participants — and in at least one case, it seems that an unrelated person was swept up in the fervor. The consequences for this person were significant, causing him to hide until the emotional upheaval subsided. Yet we’ve been here before: In the aftermath of the Boston Marathon bombing, several people were similarly misidentified from image data. The consequences for incorrect linking (such as linking image data to the wrong person) can be far more serious than including the wrong map in a news report — something easily corrected.

In these individual cases, there was someone to notice and complain about the misidentified data. The error could be highlighted for further scrutiny and correction, however difficult, could begin. But with large volumes of data to link, the chances are higher that such misidentifications go undetected.

For example, data processing steps in handling genetic data can cause genes like “Septin 2” (abbreviated as “SEPT2”) to be interpreted as “Sept. 2” — and get dropped in subsequent analysis as the records then silently fail to match reference data about the gene.

Topics

Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.
More in this series

More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.