Framing Data Science Problems the Right Way From the Start

Data science project failure can often be attributed to poor problem definition, but early intervention can prevent it.

Reading Time: 5 min 

Topics


The failure rate of data science initiatives — often estimated at over 80% — is way too high. We have spent years researching the reasons contributing to companies’ low success rates and have identified one underappreciated issue: Too often, teams skip right to analyzing the data before agreeing on the problem to be solved. This lack of initial understanding guarantees that many projects are doomed to fail from the very beginning.

Of course, this issue is not a new one. Albert Einstein is often quoted as having said, “If I were given one hour to save the planet, I would spend 59 minutes defining the problem and one minute solving it.”

Consider how often data scientists need to “clean up the data” on data science projects, often as quickly and cheaply as possible. This may seem reasonable, but it ignores the critical “why” question: Why is there bad data in the first place? Where did it come from? Does it represent blunders, or are there legitimate data points that are just surprising? Will they occur in the future? How does the bad data impact this particular project and the business? In many cases, we find that a better problem statement is to find and eliminate the root causes of bad data.

Too often, we see examples where people either assume that they understand the problem and rush to define it, or they don’t build the consensus needed to actually solve it. We argue that a key to successful data science projects is to recognize the importance of clearly defining the problem and adhere to proven principles in so doing. This problem is not relegated to technology teams; we find that many business, political, management, and media projects, at all levels, also suffer from poor problem definition.

Toward Better Problem Definition

Data science uses the scientific method to solve often complex (or multifaceted) and unstructured problems using data and analytics. In analytics, the term fishing expedition refers to a project that was never framed correctly to begin with and involves trolling the data for unexpected correlations. This type of data fishing does not meet the spirit of effective data science but is prevalent nonetheless. Consequently, defining the problem correctly needs to be step one. We previously proposed an

Topics

More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.

Comments (2)
Tathagat Varma
While problem definition looks like a very obvious thing to do, in my view the key reason most teams are so bad at it is because there is no one single view of the problem itself. The marketing, sales, R&D, operations, data science....everyone sees it from their narrow lens and believe they have defined the problem correctly. Perhaps so, but they are only like the five blind men interpreting the elephant in their own limited ways. Unless they can agree on one single view, the disconnect will likely continue.
yi yao
Totally agree. We should make a SMART problem statement at the very beginning.