How to Get Proactive About Data Quality
The best way to improve data quality is to prevent data errors at the source. But that requires major shifts in mindsets and organization, as meal-kit company HelloFresh learned.
Topics
Matt Harrison Clough
Poor data quality undermines good decision-making and dooms many AI initiatives. However, many organizations are operating in unmanaged data mode or organized cleanup mode, with low-quality data. Companies that make the move to proactive prevention mode, in which data errors are prevented at the source, benefit from better business decisions and more trustworthy analytics. Learn how to help your organization make the leap, as executives at meal-kit company HelloFresh did.
When it comes to dealing with data quality, teams and companies fall into one of three modes: unmanaged, organized cleanup, or proactive prevention. Most organizations get stuck in one of the first two. The work of addressing data issues is demanding, messy, and time-consuming. Poor-quality data can cripple decision-making and doom generative AI projects, since bad data fed to AI models turns into untrustworthy results.   
The real data quality breakthrough happens when companies transition to the third mode, where errors are prevented at the source. But this shift requires a major change in mindset, in which every employee recognizes that they are both a data creator and a data customer and starts acting like it. 
How can companies reach this third mode of data quality? In our experience, the change often starts with a provocateur, such as a manager with a nagging business problem, and gains momentum when leaders at many levels start working together to improve data quality within their own spans of influence. Let’s explore lessons on how to get started and how this journey to proactive data quality improvement has worked at organizations like meal-kit company HelloFresh.
A Common Trap
It is easy to see how companies get caught in the unmanaged and organized cleanup modes. Look at the data flow in any company and you’ll observe a daisy chain: People in a department use data to do their jobs, in turn creating new data, which goes to the next group in line. People generally work within their silos, seeing themselves only as salespeople, vendor managers, market researchers, and so forth. When someone — say, a salesperson — sees an error, it’s only natural that they want to correct it. But correcting errors is difficult, time-consuming work, and plenty of quality issues go undetected, propagating further downstream damage. This is Mode 1, unmanaged data.
Sooner or later, someone recognizes the business impact of a constant stream of data errors. The company then adopts a more formalized and centralized approach in which a data cleanup team implements a tool to address errors better, faster, and more cheaply. The gains are generally small. Finding errors is easy, but fixing them without understanding the business context is not. This is Mode 2, organized cleanup. While this beats pure chaos, it’s still an endless cycle of fixing errors.â