There is a growing belief that sophisticated algorithms can explore huge databases and find relationships independent of any preconceived hypotheses. But in businesses that involve scientific research and technological innovation, the authors argue, this approach is misguided and potentially risky.
As “big data” becomes increasingly integrated into many aspects of our lives, we are hearing more calls for revolutionary changes in how researchers work. To save time in understanding the behavior of complex systems or in predicting outcomes, some analysts say it should now be possible to let the data “tell the story,” rather than having to develop a hypothesis and go through painstaking steps to prove it. The success of companies such as Google Inc. and Facebook Inc., which have transformed the advertising and social media worlds by applying data mining and mathematics, has led many to believe that traditional methodologies based on models and theories may no longer be necessary. Among young professionals (and many of the MBA students we see), there is almost a blind faith that sophisticated algorithms can be used to explore huge databases and find interesting relationships independent of any theories or prior beliefs. The assumption is, the bigger the data, the more powerful the findings.
As appealing as this viewpoint may be, we think it’s misguided — and could be risky for companies. For example, what if the data appears to support a new drug design or a new scientific approach when there actually isn’t a causal relationship? Although we acknowledge that data mining has enabled tremendous advances in business intelligence and in the understanding of consumer behavior — think of how Amazon.com Inc. figures out what you might want to buy or how content recommendation engines such as those used by Netflix Inc. work — it’s important for executives who oversee technical disciplines to be thoughtful about how they apply this approach to their areas.
Recently, we looked at several fields where massive amounts of data are available and collected: drug discovery and pharmaceutical research; genomics and species improvement; weather forecasting; the design of complex products like gas turbines; and speech recognition. (See “About the Research.”) In each setting, we asked a series of broad questions, including the following: How do data-driven research approaches fit with traditional research methods? In what ways could data-driven research extend the current understanding of scientific and engineering problems? And what cautions do managers need to exercise about the limitations and the proper use of statistical inference?