Data Scientist In a Can?

Companies try to automate the data scientist function to deal with skills gap.

Reading Time: 4 min 


Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.
More in this series

It’s gospel that companies everywhere want to hire data scientists, and can’t find them. In 2011, McKinsey projected a gap by 2018 of more than 140,000 unfilled big data jobs and 1.5 million related jobs in management and analysis. Not to be outdone, Gartner said that big data would create 1.9 million jobs in IT alone by 2015, of which two-thirds, or more than 1.2 million, would go unfilled.

At one time, there were expectations that a lack of telephone operators and chauffeurs would slow the spread of the telephone and the car. Instead, people learned to operate their own phones and cars, in part through development of easier-to-use technology. There are companies trying to do the same with analytics skills — to automate them so people and companies can “do analytics” without a PhD in a mathematically inclined field. Companies like Apigee and Nutonian are offering “data scientists in a can” — that is, analytics provided as a service, so companies can fill their need for data scientists without actually hiring some.

“For every organization, our pitch is, you don’t need them [data scientists]. We’re faster, more accurate, scale better — and we’re a lot cheaper,” says Scott Howser, senior vice president of products and marketing at Nutonian. Nutonian makes Eureqa, which it bills with the tagline “No PhD? No Problem.” Eureqa was initially developed when Michael Schmidt, Nutonian’s founder, was a graduate student at Cornell. His program then was called a “robotic scientist,” because it could derive scientific laws by analyzing experimental data. [The specifics are available on YouTube in an hour-long presentation by Cornell professor Hod Lipson, or see Michael Schmidt’s 15-minute TEDx talk on his “robotic scientist” for a shorter version.]

Many businesses and large organizations use Eureqa. A case study on its site about Kansas City Power & Light says the utility adopted Eureqa when its analysts needed to do heavy-duty modeling to predict energy demand during different kinds of weather. It was possible, but complex and slow, to do the models in Excel, but the other option the utility explored meant the analysts would have to learn R, a statistics-oriented computer language. In another example, a process engineer at Rio Tinto’s Fer et Titane unit used Eureqa to improve production quality for one of its metal powders. It took him less than a day; previously, such work would have been done by an outside consultant and taken months to complete.

Although organizations tend to use its tool to complement their data scientists, to help them scale and meet demand, Howser says Nutonian could actually replace data scientists — he’s adamant he has not found a situation where a human data scientist can do something the tool can’t. The obstacles, Howser believes, aren’t technical but cultural: people need time to change their habits. Over time, he believes that companies will adapt to software that replaces the data scientists they think they need now.

That may be true, but only in certain cases, says Ram Akella, a professor of information systems and technology at the University of California, Berkeley and UC Santa Cruz. He says that while there are some kinds of problems that carry over across industries and lend themselves to automation — either through software or through a consulting model, where data scientists develop tools to solve a specific set of problems — other kinds of analytics are too complicated to be automated. Sometimes, Akella says, it is simple to model a problem, and sometimes it’s complicated. It can be difficult, too, for other parts of the analytics problem to be automated, like data gathering.

The harder it is to tackle any step in the problem, the less likely it is that companies will benefit from data science as a service. In that sense, analytics is like open source software. Any company can adopt free software, but many of them can’t actually make it work effectively. Akella does think that, over time, data science will become more commoditized. Analytics problems will be solved, then become routine. It will indeed become easier to pull your data scientist out of a can. However, companies will still need to do work to adapt their culture. “There’s a broader question of how can humans and algorithms work together,” Akella says.

But even then, there will still be humans doing data science work, he expects. If all the data scientists are gone, there will be no one to help the machines learn. Even if human participation gets reduced to a sliver, it needs to be there, or the algorithm won’t learn.

Somebody will always have to open the can.


Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.
More in this series

More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.

Comments (3)
bad gi
This is quite disturbing.  Its scary to think machines and platforms can replace data scientists.  Ideally, I'd want to stay one for the next 40 years.
Sridhar Ramaswamy
Data Analytics involves several steps in my opinion. It requires understanding the business problem, gathering relevant data, data cleansing,  solving and deriving business inference out of the solution. I am sketptical any data analytics software could do these steps  as "Data Scientist in a Can" without human intervention. 

It requires an SME to put forth a business problem, data scientist to translate into a relevant analytical problem, identify relevant data and processes, besides intelligently coming up with relevant algorithms.
I think Crowd Analytix is at the forefront  of this , it has grown very quickly as  more enterprises see the huge value in on demand analytic 

Check them out