Detecting Bias in Data Analysis

How you handle your data — from cleanup through presentation — affects the results you’ll get.

Data analysis can be determined as much by external agendas as by math and science. These agendas can come from many sources — personal, political, or technical.

At a personal level, analysts or managers may have vested interests in one outcome over another or may seek justification for prior claims based on intuition; they know the results the analysis should find. Politically, they may be conditioned from past decisions or be wary of the implications of an outcome. Technically, they may know valid limitations of the source data that lead them to discredit the results of whatever analysis is brought to bear.

These outside agendas can be overtly or subtly embedded in the analysis. The result is analysis that favors one outcome over another — perhaps in ways that run counter to organizational objectives.

How can managers better detect and address agendas embedded in analysis?

Through practice.

In a course I teach, my students improve their ability to recognize embedded agendas through “data debates.” Like traditional debates, two groups take opposing positions on a question. Each group prepares by developing arguments to support their position and then presents their argument to the class. Clarification questions and rebuttals follow. (I stop short of voting on a winner to keep competitive spirits from dominating educational spirits and to encourage experimentation.)

Unlike in traditional debates, student arguments must rely exclusively on analysis of a provided dataset. To minimize the possibility of prior opinions, the questions are fictitious: Given production logs, are a company’s hoverboards defective? Given accounting information, is the acquisition price right for a startup? Was marketing for The Dillionaire effective? Each question comes with a generated dataset that is large enough to require analysis beyond desktop spreadsheets, but not unwieldy.

The classroom debate that follows offers several lessons for organizations that must deal with far murkier data and more complex agendas.

Janitorial Work Can Sway

Although the datasets provided for class are simpler than those found in real organizations, they are messy in similar ways. They contain outliers, incompleteness, and other ambiguity. They may require significant janitorial work before analysis begins in earnest. Identifying potential inaccuracies is a necessary first step. But, once identified, analysts have considerable discretion in the cleaning steps that follow.

These discretionary decisions made to address the messiness can affect the results of the analysis. And outside agendas influence the diligence of janitorial steps and the choices analysts make as they clean data.

Interpret Questions Creatively

Embedding an agenda often occurs in the initial interpretation of the focal question. For example, what does “effective” mean? Fast? Thorough? Inexpensive? Robust?

Initially, ambiguity in interpretation of the question is a source of discomfort and unease for students. But within the ambiguity is opportunity. The choice of metric can lead to significantly different conclusions.

Anticipate Counterarguments

Put yourself in your opponent’s shoes. How do you expect they would best counter your point? And what would be your best response to that? For example, analysis often requires making assumptions. But someone is bound to disagree with whatever assumption is made. The debate format encourages people to clarify any assumptions made and, more importantly, to consider the effects of disagreement with those assumptions.

Presentation Matters

Analysis may have weak spots: important aspects may be omitted, speculation may be presented as fact, conclusions may overreach the analysis. Can you spot where presentation skills are being used to compensate or distract from tenuous arguments?

Outside agendas shape the way people present analysis, dictating what they focus on and what they omit. The important part is that the presenter gets to choose the presentation. That choice influences where attention will focus.

2 Comments On: Detecting Bias in Data Analysis

  • Branden Williams | January 28, 2015

    This is interesting, but it really didn’t deliver for me. How about some techniques to do this mathematically? For example, you may pass all the tests listed here but there still could be bias in the data. Perhaps you could suggest some methods that could help with this in R?

  • Isabella Ghement | January 30, 2015

    Interesting article – than you for writing it, Sam! It made me wonder: Once we are aware of the external agendas embedded in a particular analysis, how can we best minimize their influence on the data analysis and its findings?

Add a comment