The Machine Learning Race Is Really a Data Race

Organizations that hope to make AI a differentiator need to draw from alternative data sets — ones they may have to create themselves.

Reading Time: 6 min 


Like what you’re reading?
Join our community

5 free articles per month, $6.95/article thereafter, free newsletter.

$89 $45/Year

Unlimited digital content, quarterly magazine, free newsletter, entire archive.

Sign me up

Machine learning — or artificial intelligence, if you prefer — is already becoming a commodity. Companies racing to simultaneously define and implement machine learning are finding, to their surprise, that implementing the algorithms used to make machines intelligent about a data set or problem is the easy part. There is a robust cohort of plug-and-play solutions to painlessly accomplish the heavy programmatic lifting, from the open-source machine learning framework of Google’s TensorFlow to Microsoft’s Azure Machine Learning and Amazon’s SageMaker.

What’s not becoming commoditized, though, is data. Instead, data is emerging as the key differentiator in the machine learning race. This is because good data is uncommon.

Useful Data: Both Valuable and Rare

Data is becoming a differentiator because many companies don’t have the data they need. Although companies have measured themselves in systematic ways using generally accepted accounting principles for decades, this measurement has long been focused on physical and financial assets — things and money. A Nobel Prize was even awarded on capital asset pricing in 2013, reinforcing these well-established priorities.

But today’s most valuable companies trade in software and networks, not just physical goods and capital assets. Over the past 40 years, the asset focus has completely flipped, from the market being dominated by 83% tangible assets in 1975 to 84% intangible assets in 2015. Instead of manufacturing coffeepots and selling washing machines, today’s corporate giants offer apps and connect people. This shift has created a drastic mismatch between what we measure and what actually drives value.

The result is that useful data is problematically rare. There is a growing gap between market and book values. Because of this gap, companies are racing to apply machine learning to important business decisions, even replacing some of their expensive consultants, only to realize that the data they need doesn’t even exist yet. In essence, the fancy new AI systems are being asked to apply new techniques to the same old material.

Just like people, a machine learning system is not going to be smart about any topic until it has been taught. Machines need a lot more data than humans do in order to get smart — although, granted, they do read that data a lot faster. So, while there is a visible arms race as companies bring on machine learning coders and kick off AI initiatives, there is also a behind-the-scenes, panicked race for new and different data.

In finance, for instance, alternative data reaches beyond the traditional Securities and Exchange Commission reports and investor presentations that influence investment decisions. Alternative data, such as social media sentiment or number of patents awarded, is essential for two important reasons. First, traditional data focuses on traditional assets, and that isn’t expansive enough in the age of intangible assets. Second, there’s no reason to bother using machine learning to study the same data sets that everyone else in the market is analyzing. Everyone who is interested has already tried to correlate industry trends, profit margins, growth rates, earnings before interest and taxes, asset turnover, and return on assets — along with the more than 1,000 other commonly reported variables with shareholder return.

Looking for connections among the same sets of material that everyone else has isn’t going to help companies win. Instead, organizations that want to use AI as a differentiator are going to have to find relationships between new data sets — data sets they may have to create themselves to measure intangible assets.

Curate Carefully: What Do You Want to Know?

Data creation is more complex than simply aggregating point-of-sale or customer information and dumping it into a database: Most organizations mistakenly believe that an expedient path involves gathering every scrap of possible data and painstakingly combing through it all hoping to find a glimmer of insight — the elusive feature that predicts or categorizes something they care about.

While machine learning can occasionally surprise us with a flash of rare brilliance that no one has yet to discover, the technology isn’t capable of providing these insights with consistency. This doesn’t mean the tool is broken. It means we have to apply it wisely. This is easier said than done: For instance, in our research of the alternative data market, we found that more than half of new data providers are still focused on measuring physical and financial assets.

The step that many organizations omit is creating a hypothesis about what matters. Where machine learning really excels is taking an insight that humans have — one based on rules of thumb, broad perceptions, or poorly understood relationships — and developing a faster, better understood, more scalable (and less error-prone) method for applying the insight.

In order to use machine learning in this way, you don’t feed the system every known data point in any related field. You feed it a carefully curated set of knowledge, hoping it can learn, and perhaps extend, at the margins, knowledge that people already have.

Insightful Machine Learning Comes From Different Data

All this has three specific implications for companies wanting to create impactful and valuable machine learning applications:

  • Differentiated data is key to a successful AI play. You won’t uncover anything new working with the same data your competitors have. Look internally and identify what your organization uniquely knows and understands, and create a distinctive data set using those insights. Machine learning applications do require a large number of data points, but this doesn’t mean the model has to consider a wide range of features. Focus your data efforts where your organization is already differentiated.
  • Meaningful data is better than comprehensive data. You may possess rich, detailed data on a topic that simply isn’t very useful. If your company wouldn’t use that information to help inform decision-making on an ad hoc basis, then that data likely won’t be valuable from a machine learning perspective. An expert machine learning architect will ask you tough questions about which fields really matter, and how those fields will likely matter to your application of the insights you get. If these questions are difficult to answer, then you haven’t put in the thought needed to produce practical value.
  • What you know should be the starting point. Companies that best use machine learning begin with a unique insight about what matters most to them for making important decisions. This guides them about what data to amass, as well as what technologies to use. An easy place to begin is to scale and grow a piece of knowledge that your team already has and that could create more value for the organization.

It’s clear that software has eaten the world (a phrase coined by software entrepreneur Marc Andreessen). But it is still hungry! Software needs a steady diet of new data combined with new technologies to continue adding value.

You don’t want to be left behind by this shift in insights, machines, and alternative data. Start looking internally to identify your unique perspective and the valuable, alternative data you could and should produce. It’s from those steps that you’ll discover the related insights to keep your organization competitive.


More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.

Comments (4)
Barry Libert

Thanks for your comment and example from the Supply Chain - eg POV about Make to Order versus Make to Stock.   

Mani Iyer
I fully agree on the value proposition for AI in an organization. With many years of my personal experience in Supply Chain Consulting I have come across one thing that has a common thread when it comes to AI. It takes a lot of time building business and process knowledge and the person(s) that can contribute to AI are the ones that bring unique perspectives on several business scenarios that are not typically documented anywhere. For example, building a use case for a 'Make to Stock' when you have always done 'Make to Order'. We are barely scratching the surface of AI and the typical tools as rightly pointed out are 'Only Tools' without the unique insight from key resources in an organization.
Barry Libert

Thanks for your comment.  We appreciate your feedback. 

We agree with your comment about intelligentt systems.  Our own experience suggests that even with intelligent systems, you need your own proprietary ‘lens’ through which one needs to look to create truly unique IP.  If not, you are simply looking at the same data that everyone else is looking at using the same tools.  

In short. As Marcel Proust said (and i paraphrase) - ‘the journey is not seeing new lands, but in seeing through new eyes’.  Given that, we recommend that every firm start by instrumenting themselves - e.g. documenting their own IP/POV and then seeking data that helps either prove or disprove the thesis.  

What is your experience? 

Best regards, 

Tom Abeles
Beck and Libert clearly and repetitively state that what is needed is new or different "data" rather than the standard metrics. Yet they keep their hands over those critically new areas and how they are applied, differently and transcend what competitors are also doing.

Intelligent AI systems can now scan text and mine these for correlations. This is regularly reported in the public arena as seen with Google, Amazon, Apple, Microsoft and others.

Where is this "edge" that the authors push forward as critical and that they are not providing to all their current and future clients?

This is a marketing article that should be usefully expanded in the trade press and not be stamped with an MIT imprimatur.