The Data Science Management Process

Data science initiatives should be integrated with the overall business strategy, and then overseen by an intermediary group that works between the company and its data scientists.

Reading Time: 6 min 


It is increasingly clear that companies and government agencies do not know how to manage data science at the enterprise level. Many are still stuck doing pilots. Some take on projects that are beyond their capabilities. And too often, excellent work dies on the vine during implementation. Companies must take action to address the structural and process issues that hold them back.

In an earlier article, we pointed out the major structural flaw hindering many data science programs — the inherent conflict between data science groups (which we termed the lab) and business operations (termed the factory). To resolve that conflict, we proposed a data science bridge: an intermediary group headed by a person with the title innovation marshal tasked with ensuring better communication and integration between the two groups and surfacing the best ways to make inventions by the lab fit into the needs of the factory.

This article builds on that structural solution by addressing the issues associated with managing the process at an enterprise level. Proper management includes driving collaboration, developing human capital, ensuring data quality, managing the project portfolio, and ensuring the business impact of all data science efforts. We propose that this overall data science management process be owned by the person leading the data science bridge.

Five Core Tasks for Managing Data Science

Managing the process requires, at a minimum, proper organizational structure — that is, the bridge — as well as the right people in place within this structure and the right set of core tasks. (See “Visualizing the Data Science Management Process.”) This model draws on well-established management structures for finance, human resources, manufacturing, and marketing.

Our proposed data science management process is presented as a cycle, or continuous loop. Data science resides within the context of the organization and its overall business strategy. That strategy determines what needs to be accomplished and provides high-level direction to the data science bridge. Elements of this direction can be quite broad, including desired competitive position, financial goals, opportunities to innovate within the lab, and specific factory improvement targets. Data science projects are then implemented and managed. The results — there will be successes, of course, but failures, too — feed back into the business’s overall context and add to the guidance around business strategy. This completes the cycle.

Importantly, we concur with MIT Sloan Management Review’s David Kiron and MIT Sloan School of Management’s Michael Schrage (among others) that companies should eschew a separate AI strategy. It only confuses matters.

In executing the strategic direction, the bridge utilizes five core tasks — subprocesses that each call for depth and nuance. In concert, they form the overall data science management process. They are:

1. Drive collaboration across the organization as it relates to data science. Most companies should start with this task. Data science is a team sport, and without teamwork, mediocre results are all but assured.

Good collaboration starts with common goals and good communication, but different groups speak different languages. Ongoing miscommunication is at the root of many data science failures. A critical first step is to standardize terminology. An obvious example is the term data science, which is too often used interchangeably with machine learning, AI, statistics, computer science, and business analytics, even though each term has a distinct meaning.

We find that discussions around the notion that data science is a team-based process for solving business problems — and not just about the use of high-powered tools — helps the factory and the lab find common ground.

2. Develop the human capital required to achieve the organization’s data science objectives. Importantly, this does not simply require hiring a sufficient number of data scientists but extends to bringing data science literacy across the organization. Virtually all employees can conduct small data projects and contribute in a meaningful way to data science efforts. While the human resources department needs to be integrally involved, data science skills are too specialized to fully delegate this responsibility to HR. Developing human capital is a long-term, ongoing process — not a one-shot effort. It needs to include an appropriate curriculum, hands-on examples that engage attendees, and learning plans to upskill people of diverse backgrounds.

3. Ensure data quality. Most data science teams are aware of the importance of data quality and devote a substantial fraction of their effort to dealing with mundane quality issues. Yet they often lack the needed skills and tools to deal with those issues’ root causes. Indeed, people analyzing data are organizationally separate from data creation, so they are not in a good position to evaluate its quality, much less improve it. Complicating matters further, as models are turned over to the factory, quality issues expand from the historical data used to train the model to the newly created data used to operate it.

The bridge must help sort all this out. It must help build the data supply chains to ensure that managerial accountability for data quality is clear in both model development and in production; that quality standards, policies, procedures, and tools are in place; and that data scientists, external data providers, and employees use them.

4. Manage the project portfolio. Overseeing all the projects is much more involved than managing individual projects, especially since there should be many small data projects along with larger, more complex ones. Portfolio management includes the difficult tasks of determining which potential projects to fund and which not to fund, assigning data scientists and factory people to project teams, and canceling projects that are clearly not achieving the desired goals, to name a few.

Resource allocation is particularly tough, as everyone wants the top data scientists on their projects. In an organization that thinks strategically, however, the best data scientists work on the most business-critical projects, not simply those involving the most data or requiring the most sophisticated techniques. There is some truth to the business version of the golden rule, which states: “Whoever has the gold makes the rules.” In the case of data science, whoever controls the funding — either the lab or factory — will tend to move forward on its own agenda, leaving the other party feeling powerless and potentially open to sabotaging the effort. By managing funding from an impartial point of view, the bridge is in a much better position to use funding to drive collaboration (task 1) and ensure business impact (task 5).

5. Ensure business impact from data science. Studies show that the tangible business impact of data science and AI has lagged the hype. One common reason is that companies put too much focus on model building and technology, and not enough on delivering results. To drive business impact, the bridge must integrate the data science lab, which develops the technology, with the factory, which deploys the technology. Specifically, the bridge can help the lab start with a “fit for use” mentality from the beginning, involve factory people in the development of models, ensure the technology is properly transferred to the factory, and help the factory deploy and sustain it to achieve maximum impact. Further, the bridge should establish how to provide independent, objective quantification of business results, given that data science organizations are naturally enthusiastic about their own work and prone to overstating its business results. A clear-eyed evaluation serves as an input to the company’s strategic direction, the next round of project selection, and improvements to the overall data science management process.

Data science is very often considered successful when the question is whether the models the teams develop actually work. But if the question is whether the models provide business value, the results tend to be far less impressive. The data science management process described here aims to help companies overcome this enormous gap, by adopting a spirit of problem-solving and continuous improvement. It helps clarify responsibilities, gets the right people involved, sorts out data quality, sees more models into production, and ensures business impact.


More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.