Big Data That’s Good for the Public

A program funded by the EU promises to semantically link open data like never before.

Facts: 900 million. Active sources: more than 100,000. Data sets: 30,000, with 200 million time-series and 1.5 billion fact values.

Link all these data sources together and what do you get? Timely, if not crucial, contextual information about markets, trends, competitors, products and consumer opinions.

This is the promise of DOPA, a project funded under the umbrella of the European Union’s Seventh Framework (a made-for-HBO series title if I’ve ever heard one) implemented to further European research and economic development.

DOPA’s goal is to semantically link massive amounts of open economic and financial data — quantitative, qualitative, structured, unstructured and polystructured (as in audio, video, images, free-form text, tables and XML files) — and make it available through a framework that standardizes data sets. Its hoped-for outcomes include a bevy of innovations based on new ways of looking at publicly available data.

The DOPA network will have four basic components, according to the official project site:

  1. Large-scale, high-quality information sourcing (automation of dataset detection and curation workflow)
  2. Automated information processing at scale by way of Data Supply Chains on a distributed platform

  3. Automated entity linkage to help bring together related data from disparate sources
  4. Visualization tools to help make sense of this wealth of data

DOPA’s real power and value will come from the way it connects disparate open data sets. The idea behind the semantic linkage is to connect sources of information that previously were not tied together, says DataMarket’s founder Hjalmar Gislason, whose company is one of the half-dozen commercial and research-oriented organizations participating in the DOPA project. DataMarket will provide the visualization technology and contribute a statistical data pool of over 125 million time-series values.

What’s the big deal about semantic linkage? “[Semantic linkage] allows, for example, geographical data to be drawn on maps because we can hook any references to countries, states or municipalities to the geographical entities that they really are,” says Gislason. “Even though they’re read from sources that have no concept of what those references really are.”

For instance, a user of the DOPA platform may search for how the financials of a given company have been performing over time. Semantically linked data can turn up not only the company’s financial information, but also its CEO and any news articles or legal documents tied to that person — without doing a keyword search. “It can bring up whatever is linked to that CEO entity through this networked web of data,” said Gislason.

According to Gislason, DataMarket has started adding semantics to common statistical data sets — geographical areas, measurement units, currencies, gender and age groups. DataMarket, along with DOPA’s other partners — VICO Research, OKKAM SRL, AMI Software, Internet Memory Research and TU Berlin — will commercialize products and services that span a gamut of capabilities, from connecting social data to visualizing statistical data.

The EU is not alone in its efforts to link open data. Tim Berners-Lee (@timberners_lee), director of the World Wide Web Consortium (W3C) — and the inventor of the World Wide Web — has long been a proponent of linking open data. In fact, W3C is working on something called the Semantic Web. According to its website:

W3C is helping to build a technology stack to support a “Web of data,” the sort of data you find in databases. The ultimate goal of the Web of data is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network. The term “Semantic Web” refers to W3C’s vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data.

There is also a focus on open data this side of the pond. In May, President Obama signed an Executive Order making data created by governmental organizations more accessible to the public and to entrepreneurs. The goal: fuel innovation and economic growth.

Efforts to use public data to achieve such goals are not new. In 1983, President Ronald Reagan made GPS data publicly available. By 1989, Magellan, a U.S. company, commercialized the first portable GPS system, paving the way for Google and Apple maps and much of the world’s use of turn-by-turn navigation.

For those interested in semantically linked financial and economic data, there is still a bit of a wait. A two-year project, DOPA is expected to be complete in June 2014, when the first commercial products are scheduled to be available.