Even the best companies can struggle to get good results from their data. That may make it easy for some executives to dismiss big data as hype. But in universities, researchers are beginning to use new tools and datasets to answer longstanding big questions in healthcare, public policy and finance, with significant implications for how companies will operate.
Big data is going nowhere fast. That’s both a double entendre and a millstone. Big data isn’t going anywhere — we’re generating 2.5 exabytes of data every two days. But most companies struggle to do much with the data they have.
CIOs are already dismissive of the Big Data concept. At the recent MIT Sloan CIO Symposium, one panel had a running joke where panelists tried to top each other with new words for big, like "colossal" and "gargantuan." Their point? “Big” data is really just data.
No matter what it's called, what isn’t happening is the revolution in what companies know about customers and how they can change their businesses.
Even the best companies fail to do much with their data. In this Boston Globe column, If the Internet is so smart …, Alex Beam skewered Facebook and Google for the often nonsensical way they deliver ads. For instance, why does he repeatedly get ads for a product he has already purchased online? Facebook and Google are supposed to be the best of the best when it comes to data. If even the best companies don't always do a good job with their data, can anybody?
At the CIO Symposium, Erik Brynjolfsson, an MIT Sloan professor and director of the Center for Digital Business, said the problem resembles that created when Anton van Leeuwenhoek began building remarkably high resolution microscopes. He could see things like “animal cules” swimming in a drop of water. The trouble was, nobody else had such a good microscope, which meant nobody else could measure things the way van Leeuwenhoek did.
Brynjolfsson said big data and analytics were early in their own revolution of measurement — one that will affect management, economics (and indeed all of the social sciences) and the information economy at large. “There will be a whole new set of tools that allow us to see what’s going on in organizations, between companies, even what’s going on inside people’s heads as they make decisions,” he said.
Some of those tools are emerging. Brynjolfsson moderated The Reality of Big Data, a panel featuring three MIT professors discussing how massive data sets were changing government, healthcare and finance.
One of these professors, Andrew Lo, put up a slide that looked something like a giant ball of yarn. It turned out the strands of yarn depicted relationships between the banks, insurance firms and government sovereign wealth funds between 2004 and 2006 — a step towards a map of the world financial system, he called it. Another yarn ball slide, denser and with darker strands, represented data for the last three years. The system “is connected in ways we never anticipated,” Lo said. “We’re only now at the very beginnings of understanding how to map the system, how to map the network.”
Such a map would be a novel thing, Lo said. Oddly, the financial system hasn’t been treated as a system until recently. Regulators deal with banks, insurance companies and hedge funds in isolation.
A different problem is faced by cancer researchers. Since Nixon declared war on cancer in 1971, “the reality is we haven’t made as a society a huge amount of progress on that,” said panel member Dimitris Bertsimas.
Bertsimas and some fellow researchers built a database of tens of thousands of oncology research papers to analyze them for predictions about toxicity of treatment approaches and potential survival rates (see An Analytics Approach to Designing Clinical Trials for Cancer).
Bertsimas thinks that applying analytics to such large databases will improve results of clinical trials and overall patient outcomes, in part by personalizing those results. It should also improve the design of future clinical trials.
The third panel member, Alex ‘Sandy’ Pentland, discussed research around mobile phone data, which he called “a huge reservoir of data about human behavior and social patterns,” far larger than Facebook. Pentland has worked with Orange, a large French mobile phone carrier, to create the first “data commons.”
As part of what it calls Data for Development, or D4D, Orange’s unit in the Ivory Coast released anonymized data about its records for the country’s 20 million citizens. Researchers used the data to show some potential uses for these records:
- bus commutes could be shortened by 10% if the bus routes were rearranged to reflect where commuters actually are.
- infectious disease rates could be knocked down 20%.
- a poverty index could be created for the country, based on usage patterns — as people get more money, they engage in what Pentland called exploratory behavior.
“These are public goods which people can really experience,” Pentland said.
Pentland acknowledged the obvious: cell phone data is controversial because it shows so much about individual behavior. It also raises group-related issues: in the Ivory Coast, for instance, the data show that different ethnic and language groups don’t mix much, so you can plot the lines of the same divisions that were the cause of civil war. Pentland argues that because those divisions are now visible in the data, government could promote relations between different groups. The downside, of course, is that a government could also use the data to target groups of people — creating conflict rather than avoiding it.
Still, if such data can be managed appropriately, it offers a clear public good. That could lead to cell phone carriers being allowed to create premium data sets to sell to other companies.
There is no lack of promise for big data. But there remain plenty of obstacles to its successful use.