Competing With Data & Analytics
Facts: 900 million. Active sources: more than 100,000. Data sets: 30,000, with 200 million time-series and 1.5 billion fact values.
Link all these data sources together and what do you get? Timely, if not crucial, contextual information about markets, trends, competitors, products and consumer opinions.
This is the promise of DOPA, a project funded under the umbrella of the European Union’s Seventh Framework (a made-for-HBO series title if I’ve ever heard one) implemented to further European research and economic development.
DOPA’s goal is to semantically link massive amounts of open economic and financial data — quantitative, qualitative, structured, unstructured and polystructured (as in audio, video, images, free-form text, tables and XML files) — and make it available through a framework that standardizes data sets. Its hoped-for outcomes include a bevy of innovations based on new ways of looking at publicly available data.
- Large-scale, high-quality information sourcing (automation of dataset detection and curation workflow)
- Automated information processing at scale by way of Data Supply Chains on a distributed platform
- Automated entity linkage to help bring together related data from disparate sources
- Visualization tools to help make sense of this wealth of data
DOPA’s real power and value will come from the way it connects disparate open data sets. The idea behind the semantic linkage is to connect sources of information that previously were not tied together, says DataMarket’s founder Hjalmar Gislason, whose company is one of the half-dozen commercial and research-oriented organizations participating in the DOPA project. DataMarket will provide the visualization technology and contribute a statistical data pool of over 125 million time-series values.
What’s the big deal about semantic linkage? “[Semantic linkage] allows, for example, geographical data to be drawn on maps because we can hook any references to countries, states or municipalities to the geographical entities that they really are,” says Gislason. “Even though they’re read from sources that have no concept of what those references really are.”
For instance, a user of the DOPA platform may search for how the financials of a given company have been performing over time.