Magazine Fall 2017 Issue Frontiers

The Subtle Sources of Sampling Bias Hiding in Your Data

Plummeting data acquisition costs have contributed to a surge in business analytics. But more data doesn’t inherently remove sampling bias — and in some cases, it could make it worse.

Sam Ransbotham May 30, 2017 Reading Time: 7 min

Topics

Frontiers

An MIT SMR initiative exploring how technology is reshaping the practice of management.

More in this series

Permissions and PDF

Twitter Facebook Linkedin

When a group of Boston College students started an analytics project using data about UFO sightings, they thought they’d learn something about visits from spaceships and alien creatures — such as how weather and movie releases influence sightings. The Economist had done something similar, finding that most UFO reports are made during what it called “drinking hours” (5 to 11 p.m.), when people could be “nursing their fourth beer” — a possible connection that the publication dubbed “close encounters of the slurred kind.”

Instead, the students learned about sampling bias.

UFO sighting reports in the United States have increased substantially since the National UFO Reporting Center, a private organization based in Davenport, Washington, started tracking them in 1974. But this might not mean that we are getting more visitors from outer space.

When the reporting center first opened, communicating a sighting required making a telephone call to file a report. Once the internet became publicly available and people could make reports using an online form, the number of sightings began to rise. This easier and cheaper collection system provided more data about sightings. But the increase in the availability of data fundamentally changed the sample set — and any change in data affects the conclusions we can draw from that data.

Looking beyond the world of UFOs, lower costs of data collection provide value in many ways: We have much more data to work with and learn from than ever before. But managers must be careful to understand how the data was generated and how that might influence its value. The sources of bias in data sets can be far subtler than the ones that could be at play in the UFO data. What’s more, the task of interpreting data is falling on the shoulders of more people in organizations. What biases should managers be on the lookout for as they work to gain insight from increasing amounts of available data? And how can managers help their employees become better at spotting such biases?

Here are four practices that can help:

Get Updates on Leading with AI and Data

Monthly insights on how artificial intelligence impacts your organization and what it means for your company and customers.

Please enter a valid email address

Thank you for signing up

Understand the history behind your data. New data can be fundamentally different from older data in ways that managers must understand. In the infamous Chicago Daily Tribune “Dewey Defeats Truman” example, when the newspaper prematurely printed an incorrect headline about the winner of the U.S.

Topics

Frontiers

An MIT SMR initiative exploring how technology is reshaping the practice of management.

About the Author

Sam Ransbotham is an associate professor of information systems at the Carroll School of Management at Boston College and the MIT Sloan Management Review guest editor for the Data and Analytics Big Idea Initiative. He can be reached at sam.ransbotham@bc.edu and on Twitter @ransbotham.

Acknowledgments

The author thanks Boston College students Matthew Frederick, Puneet Nayyar, Amanda Valdes, Alexa Villalobos, and Valeria Yanes for insights from their analytics project about UFO sightings.

Tags:

Reprint #:

59126

Topics

Frontiers

Get Updates on Leading with AI and Data

Topics

Frontiers

About the Author

Acknowledgments

Tags:

Reprint #:

More Like This

Add a comment Cancel reply