The Big Impact of Small Data Errors
Even when the data itself is solid, mistaken connections are sometimes made during analysis. But with vigilance, managers can avoid a data mishap.
Topics
Competing With Data & Analytics
A 2008 news report led with reports of Russian tanks and troops surging into Georgia — accompanied by a map mistakenly depicting the invaded territory as Savannah, Georgia, rather than the homonymous Eastern European country. While the story was correct, Google News’ map selection was not.
Born and bred in Georgia as I am, any initial fears for my kinfolk were quickly dispelled when a cursory investigation revealed that the map of the country of Georgia would have been a better image, not the map of the U.S. state of the same name. This was a simple case of misidentifying the value “Georgia” when associating the news data with the map data.
But other cases of data misidentification are not so simple and carry greater consequences. To the chagrin of many people sharing names with others with more nefarious tendencies, a false match to the no-fly list can be quite an inconvenience. Images, for example, can be linked to people. Advances in image enhancement and processing are yielding growing prowess in facial recognition — and growing concerns about misidentification.
In a recent, painfully public episode, online vigilantes used the copious amounts of image data from the recent Charlottesville protests to identify participants — and in at least one case, it seems that an unrelated person was swept up in the fervor. The consequences for this person were significant, causing him to hide until the emotional upheaval subsided. Yet we’ve been here before: In the aftermath of the Boston Marathon bombing, several people were similarly misidentified from image data. The consequences for incorrect linking (such as linking image data to the wrong person) can be far more serious than including the wrong map in a news report — something easily corrected.
In these individual cases, there was someone to notice and complain about the misidentified data. The error could be highlighted for further scrutiny and correction, however difficult, could begin. But with large volumes of data to link, the chances are higher that such misidentifications go undetected.
Get Updates on Leading with AI and Data
Monthly insights on how artificial intelligence impacts your organization and what it means for your company and customers.
Please enter a valid email address
Thank you for signing up
For example, data processing steps in handling genetic data can cause genes like “Septin 2” (abbreviated as “SEPT2”) to be interpreted as “Sept. 2” — and get dropped in subsequent analysis as the records then silently fail to match reference data about the gene.