Detecting Bias in Data Analysis

How you handle your data — from cleanup through presentation — affects the results you’ll get.

Reading Time: 3 min 

Topics

Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.
See All Articles in This Section
Already a member?
Not a member?
Sign up today
Member
Free

5 free articles per month, $6.95/article thereafter, free newsletter.

Subscribe
$75/Year

Unlimited digital content, quarterly magazine, free newsletter, entire archive.

Sign me up

Data analysis can be determined as much by external agendas as by math and science. These agendas can come from many sources — personal, political, or technical.

At a personal level, analysts or managers may have vested interests in one outcome over another or may seek justification for prior claims based on intuition; they know the results the analysis should find. Politically, they may be conditioned from past decisions or be wary of the implications of an outcome. Technically, they may know valid limitations of the source data that lead them to discredit the results of whatever analysis is brought to bear.

These outside agendas can be overtly or subtly embedded in the analysis. The result is analysis that favors one outcome over another — perhaps in ways that run counter to organizational objectives.

How can managers better detect and address agendas embedded in analysis?

Through practice.

In a course I teach, my students improve their ability to recognize embedded agendas through “data debates.” Like traditional debates, two groups take opposing positions on a question. Each group prepares by developing arguments to support their position and then presents their argument to the class. Clarification questions and rebuttals follow. (I stop short of voting on a winner to keep competitive spirits from dominating educational spirits and to encourage experimentation.)

Unlike in traditional debates, student arguments must rely exclusively on analysis of a provided dataset. To minimize the possibility of prior opinions, the questions are fictitious: Given production logs, are a company’s hoverboards defective? Given accounting information, is the acquisition price right for a startup? Was marketing for The Dillionaire effective? Each question comes with a generated dataset that is large enough to require analysis beyond desktop spreadsheets, but not unwieldy.

The classroom debate that follows offers several lessons for organizations that must deal with far murkier data and more complex agendas.

Janitorial Work Can Sway

Although the datasets provided for class are simpler than those found in real organizations, they are messy in similar ways. They contain outliers, incompleteness, and other ambiguity. They may require significant janitorial work before analysis begins in earnest.

Read the Full Article

Topics

Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.
See All Articles in This Section

More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.

Comments (2)
Isabella Ghement
Interesting article - than you for writing it, Sam!  It made me wonder: Once we are aware of the external agendas embedded in a particular analysis, how can we best minimize their influence on the data analysis and its findings?
Branden Williams
This is interesting, but it really didn't deliver for me. How about some techniques to do this mathematically? For example, you may pass all the tests listed here but there still could be bias in the data. Perhaps you could suggest some methods that could help with this in R?