The Smart Way to Deal With Messy Data

The processing required to prepare unstructured data for analysis can be cumbersome and prone to error. That’s why companies should do more to organize their data before it is ever collected.

Reading Time: 4 min 

Topics

Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.
More in this series

Unstructured data — data that is not organized in a predefined way, such as text — is now widely available. But structure must be added to the data to make it useable for analysis, which means significant processing. That processing can be a problem.

In a form of modern alchemy, modern analytics processes now transmute “base” unstructured data into “noble” business value. Systems everywhere greedily salt away every imaginable kind of data. Technologies such as Hadoop and NoSQL store this hoard easily in its native unstructured form. Natural language processing, feature extraction (distilling nonredundant measures from larger data), and speech recognition now routinely alchemize vast quantities of unstructured text, images, audio, and video, preparing it for analysis. These processes are nothing short of amazing, working against entropy to create order from disorder.

Unfortunately, while these processing steps are impressive, they are far from free or free from error. I can’t help but think that a better alternative in many cases would be to avoid the need for processing altogether.

We all know how each step in a process mangles information. In the telephone game, as each person whispers to the next player what they think was said to them, words can morph into an unexpected or misleading final message. In a supply chain, layers exacerbate distortion as small mistakes and uncertainty quickly compound.

By analogy, organizations are playing a giant game of telephone with data, and unstructured data makes the game far more difficult. In a context where data janitorial activities consume 50% to 80% of scarce data scientist resources, each round of data telephone costs organizations in accuracy, effort, and time — and few organizations have a surplus of any of these three.

Within organizations, each processing step can be expensive to develop and maintain. But the growth in importance of data sharing between organizations magnifies these concerns. Our recently published report, “Analytics Drives Success with IoT,” associates business value with sharing data between organizations in the context of the internet of things. And, to foreshadow our report to be released in January, we observe similar results in the broader analytics context. But with every transfer of data, more processes need to be developed and maintained.

If this processing were unavoidable, then it would just be a cost of data sharing within or between organizations.

Topics

Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.
More in this series

More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.

Comments (2)
Hi-Tech BPO
Nice one Sam.

Companies should not spend a lot of time and dollars in collecting data and creating a prospect database. It is not enough for them to stay relevant in the race of data and its digitization, because they fail miserably to cleanse their data and keep it up to date. Data being one of the most important assets, protecting this investment which actively provides lifeblood to any and every business, data cleansing should top the priority list.

•	Is the quality of the data in your systems good enough to support informed decisions?
•	Are you satisfied with the speed of your data updates?
•	Are your data quality processes consistent across the organization?

If the answer to any of these critical questions is “no” or “maybe”, hire data cleansing and management experts to help you with data retrieval, integration, enrichment and validation on track for your organization’s success. They are experts at cleansing unstructured and structured data from multiple sources, and also can help your company with internal and external data management. They standardize and centralize your data processes so that you have all the insight to drive your business forward.
Davis Clark
You don't have any hands on experience processing data, do you...