The Subtle Sources of Sampling Bias Hiding in Your Data

Plummeting data acquisition costs have contributed to a surge in business analytics. But more data doesn’t inherently remove sampling bias — and in some cases, it could make it worse.

Reading Time: 7 min 



An MIT SMR initiative exploring how technology is reshaping the practice of management.
More in this series
Permissions and PDF

When a group of Boston College students started an analytics project using data about UFO sightings, they thought they’d learn something about visits from spaceships and alien creatures — such as how weather and movie releases influence sightings. The Economist had done something similar, finding that most UFO reports are made during what it called “drinking hours” (5 to 11 p.m.), when people could be “nursing their fourth beer” — a possible connection that the publication dubbed “close encounters of the slurred kind.”

Instead, the students learned about sampling bias.

UFO sighting reports in the United States have increased substantially since the National UFO Reporting Center, a private organization based in Davenport, Washington, started tracking them in 1974. But this might not mean that we are getting more visitors from outer space.

When the reporting center first opened, communicating a sighting required making a telephone call to file a report. Once the internet became publicly available and people could make reports using an online form, the number of sightings began to rise. This easier and cheaper collection system provided more data about sightings. But the increase in the availability of data fundamentally changed the sample set — and any change in data affects the conclusions we can draw from that data.

Looking beyond the world of UFOs, lower costs of data collection provide value in many ways: We have much more data to work with and learn from than ever before. But managers must be careful to understand how the data was generated and how that might influence its value. The sources of bias in data sets can be far subtler than the ones that could be at play in the UFO data. What’s more, the task of interpreting data is falling on the shoulders of more people in organizations. What biases should managers be on the lookout for as they work to gain insight from increasing amounts of available data? And how can managers help their employees become better at spotting such biases?

Here are four practices that can help:

Understand the history behind your data. New data can be fundamentally different from older data in ways that managers must understand. In the infamous Chicago Daily TribuneDewey Defeats Truman” example, when the newspaper prematurely printed an incorrect headline about the winner of the U.S.



An MIT SMR initiative exploring how technology is reshaping the practice of management.
More in this series


The author thanks Boston College students Matthew Frederick, Puneet Nayyar, Amanda Valdes, Alexa Villalobos, and Valeria Yanes for insights from their analytics project about UFO sightings.

Reprint #:


More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.