Framing Data Science Problems the Right Way From the Start

Data science project failure can often be attributed to poor problem definition, but early intervention can prevent it.

Roger Hoerl, Diego Kuonen, and Thomas C. Redman April 14, 2022 Reading Time: 5 min

Topics

The failure rate of data science initiatives — often estimated at over 80% — is way too high. We have spent years researching the reasons contributing to companies’ low success rates and have identified one underappreciated issue: Too often, teams skip right to analyzing the data before agreeing on the problem to be solved. This lack of initial understanding guarantees that many projects are doomed to fail from the very beginning.

Of course, this issue is not a new one. Albert Einstein is often quoted as having said, “If I were given one hour to save the planet, I would spend 59 minutes defining the problem and one minute solving it.”

Get Updates on Leading With AI and Data

Get monthly insights on how artificial intelligence impacts your organization and what it means for your company and customers.

Please enter a valid email address

Thank you for signing up

Consider how often data scientists need to “clean up the data” on data science projects, often as quickly and cheaply as possible. This may seem reasonable, but it ignores the critical “why” question: Why is there bad data in the first place? Where did it come from? Does it represent blunders, or are there legitimate data points that are just surprising? Will they occur in the future? How does the bad data impact this particular project and the business? In many cases, we find that a better problem statement is to find and eliminate the root causes of bad data.

Too often, we see examples where people either assume that they understand the problem and rush to define it, or they don’t build the consensus needed to actually solve it. We argue that a key to successful data science projects is to recognize the importance of clearly defining the problem and adhere to proven principles in so doing. This problem is not relegated to technology teams; we find that many business, political, management, and media projects, at all levels, also suffer from poor problem definition.

Toward Better Problem Definition

Data science uses the scientific method to solve often complex (or multifaceted) and unstructured problems using data and analytics. In analytics, the term fishing expedition refers to a project that was never framed correctly to begin with and involves trolling the data for unexpected correlations. This type of data fishing does not meet the spirit of effective data science but is prevalent nonetheless. Consequently, defining the problem correctly needs to be step one. We previously proposed an

Topics

About the Authors

Roger W. Hoerl (@rogerhoerl) teaches statistics at Union College in Schenectady, New York. Previously, he led the applied statistics lab at GE Global Research. Diego Kuonen (@diegokuonen) is head of Bern, Switzerland-based Statoo Consulting and a professor of data science at the Geneva School of Economics and Management at the University of Geneva. Thomas C. Redman (@thedatadoc1) is president of New Jersey-based consultancy Data Quality Solutions and coauthor of The Real Work of Data Science: Turning Data Into Information, Better Decisions, and Stronger Organizations (Wiley, 2019).

Tags:

Framing Data Science Problems the Right Way From the Start

Data science project failure can often be attributed to poor problem definition, but early intervention can prevent it.

Topics

Toward Better Problem Definition

Topics

About the Authors

Tags:

Add a comment Cancel reply

Comments (2)

Tathagat Varma

yi yao

Topics

Get Updates on Leading With AI and Data

Toward Better Problem Definition

Topics

About the Authors

Tags:

More Like This

Add a comment Cancel reply

Comments (2)

Tathagat Varma

yi yao