Variety, Not Volume, Is Driving Big Data Initiatives

  • Randy Bean
  • March 28, 2016

For large corporations, data variety trumps volume when looking for insights.

When many executives think of Big Data, they think of large volumes of data. A common notion is that bigger is often better when it comes to data and analytics, but this is not always the case. In their 2012 article, Big Data: The Management Revolution, MIT Professor Erik Brynjolfsson and principal research scientist Andrew McAfee spoke of the “three V’s” of Big Data — volume, velocity, and variety — noting that “2.5 exabytes of data are created every day, and that number is doubling every 40 months or so. A petabyte is one quadrillion bytes, or the equivalent of about 20 million filing cabinets’ worth of text. An exabyte is 1,000 times that amount, or 1 billion gigabytes.” This focus on the rate of data proliferation has sometimes obscured an appreciation of data and analytics value. The result is a myth about Big Data — that Big Data is synonymous with large volumes of data.

In 2012, when Brynjolfsson and McAfee published their article, Big Data was a new phenomenon. While a handful of Silicon Valley innovators like Google, Facebook, and Amazon were employing Big Data with success, Big Data was largely uncharted territory for mainstream Fortune 1,000 firms. The past several years have been period of exploration, experimentation, and trial and error in Big Data among Fortune 1,000 companies, and the result has been a different story. For these firms, it is not the ability to process and manage large data volumes that is driving successful Big Data outcomes. Rather, it is the ability to integrate more sources of data than ever before — new data, old data, big data, small data, structured data, unstructured data, social media data, behavioral data, and legacy data.

This is known as the “variety challenge,” and has emerged as the top data priority for mainstream companies, according to the fourth annual Big Data Executive Survey, conducted by NewVantage Partners and released last month. In the world of the Fortune 1,000, we are seeing that variety trumps volume and velocity when it comes to Big Data success.

Tapping Into the “Long Tail” of Big Data

When asked about drivers of Big Data success, 69% of corporate executives named greater data variety as the most important factor, followed by volume (25%), with velocity (6%) trailing. In the corporate world, the big opportunity is to be found in integrating more sources of data, not bigger amounts. Variety, not volume, is king. MIT professor and 2015 Turing Award recipient Michael Stonebraker calls this the “long tail” of Big Data, as companies focus on integrating sources of data that have traditionally been ignored, as well as identifying new data sources. Stonebraker cites the example of life sciences firms with thousands of research scientists, each with their own research databases that have not been tied together for analysis in the past. Tapping into more data sources has emerged as the new data frontier within the corporate world.

How are corporations focusing their data management efforts to develop more robust data and analytics? There are 3 primary paths that firms are taking:

Capture Legacy Data Sources

It may come as a surprise, but many firms see the big opportunity in Big Data resulting from the capture of traditional legacy data sources that have gone untapped in the past. These are data sets that have typically sat outside the purview of traditional data marts or warehouses — the “long tail” data. A significant majority (57%) of firms identified this as their top data priority. One of the beauties of Big Data is that organizations can now go deeper into their own data before they turn to new sources.

Integrate Unstructured Data

Businesses have been inhibited in their ability to mine and analyze the vast amounts of information residing in text and documents. Traditional data environments were designed to maintain and process structured data — numbers and variables — not words and pictures. A growing percentage of firms (29%) are now focusing on integrating this unstructured data, for purposes ranging from customer sentiment analysis to analysis of regulatory documents to insurance claim adjudication. The ability to integrate unstructured data is broadening traditional analytics to combine quantitative metrics with qualitative content.

Add Social Media and Behavioral Data Sources

While much of the early excitement around Big Data resulted from the capture of social media and behavioral activities by firms like eBay and Facebook, these applications have been relatively nascent among the Fortune 1,000, with just 14% citing this as a priority. As firms progress with their Big Data efforts, it is likely that they will turn attention to untapped opportunities presented by social data in areas such as patient adherence and mobile device recommendations based on consumer purchasing behavior and preferences. Timely recommendations can yield immediate results.

As mainstream companies progress on their Big Data journey, we should expect that expanding the variety of data sources for analysis will continue to dominate their interests.