Deodorizing Your Data
To deal with “smelly” data, try refactoring your analytics processes.
Topics
Competing With Data & Analytics
In programming, people describe their sense of underlying fragility in code by noting that it “smells bad.” A bad code smell is an amalgam of signals (such as size, complexity, duplication) that combine to indicate larger, deeper problems. Similarly, analytics in organizations can stink — and it takes more than a spritz of air freshener to solve the problem.
So how do you know if your analytics needs deodorizing? Symptoms abound. Poking around with data may uncover more quality problems than it does insightful answers. Results may be sketchy and change drastically under the least bit of scrutiny. Different parts of the organization have multiple versions of the truth. Analysis that should be routine is repeatedly done ad hoc each time, requiring duplication of effort with each iteration.
Certainly bad data smells are not intended — but they can be prevented by understanding how they develop. Some of the factors contributing to smelly data include:
- Complex realities. Analytics compiles data that are snapshots taken in a complex world — and these snapshots don’t always fit into well-structured or clean models. Furthermore, that world continues to change, even if systems don’t. For example, evolving businesses and requirements led to “14 separate health plans with inconsistent approaches to defining similar types of data” at the health care provider WellPoint, according to a recent MIT SMR case study. Each system likely made sense in isolation or at the time it was developed — but any later attempt to generate analytical results must synthesize each of these disparate sources of data.
- Acquisitions. Organizations often grow through acquisition of other, previously independent organizations, each with idiosyncratic systems. Tom Fontanella, senior IS director at Sanofi, reports that a master data management project that Genzyme undertook before being acquired by Sanofi found that “30-day payment terms [were] expressed as Net 30, 30, 30 Day, 30days, LC30, 030NL …” due to a series of acquisitions over a number of years, often in different areas of the world. This lack of consistency meant that the data on 30-day payment was squirreled away under a range of labels — a malodorous situation indeed.
- Urgency. Operational pressure can be intense. Urgency can require short-term solutions, and it may be difficult to find time to go back and create longer-term practices that are more robust. Hal Varian, chief economist at Google, notes the tremendous discipline necessary to standardize now versus later; Google invests considerable resources upfront to ease later growth.
Removing stench is easier said than done. In programming, people undertake “refactoring” — a series of small changes that try to change the internal working of a system without changing its observable behavior. Refactoring involves risk, particularly in the short term — it can break working systems and absorb resources without apparent payback.
But benefits for refactoring analytics within the organization can be substantial and widespread. One means of doing this: Developing a common terminology. By implementing a common set of “truths,” for example, Coca-Cola’s director of business intelligence Remco Brouwer points out that employees “can skip the first ten minutes of the meeting” where they previously needed to get up to speed on terms and definitions, which results in greater productivity. Similarly, investing in developing a common language for data meant that Intermountain Healthcare could make detailed comparisons of hospitals and departments.
Benefits go beyond clarity and transparency. A refactored analytical foundation provides a base to build from, even in unexpected areas. Because systems at Google use “the same conventions and … the same basic blocks for storing and accessing data,” says Varian, a dashboard for a new system is “a half-hour implementation.” A sweet-smelling analytical foundation enables organizations to pounce on emerging opportunities before competitors can react.
The core tension is now versus later. Paybacks from investments in analytical refactoring are not immediate, but the costs are. What should organizations do now to sweeten the smell of analytics in their organization?
Identify areas that may need attention. At WellPoint, the shift to a shared-savings model revealed inconsistences between 14 systems. At Genzyme, a new consolidated ERP system necessitated common terms. In each case, the issues existed prior to the major projects. By decoupling refactoring activities from major projects, organizations can reduce the risks associated with the larger, more strategic projects.
Make incremental, iterative improvements. A key lesson from the software development world applies to refactoring analytics. Massive, sweeping refactoring efforts inevitably struggle. Instead, developers now undertake a series of incremental changes, each as small as possible. Analytical refactorings can start with the highest ROI problems — what one data inconsistency do the most people struggle with? By taking an incremental approach, risk can be reduced and people can see the benefits sooner.
Manage to reduce “stench creation.” My grandmother always said, “Don’t take time to mess up if you can’t take time to clean up,” and her wisdom resonates in the analytics context, too. With respect to data, it is likely to be far easier to avoid creating problems than it will be to clean them up later. Organizations pressed to implement quick fixes “for now” will likely struggle to find time to refactor later. Traditional data governance still applies in our current emphasis on analytics — perhaps even more so with the emphasis on data analysis and the explosion of big data.
Look beyond infrastructure. While infrastructure examples might spring to mind more readily, analytics involves more than technology. Other aspects — people and culture — benefit from refactoring. Delay and uncertainty in payback are even more acute in these areas. Training to increase data skills takes time away from current operations. Trying to use data to augment intuition may be frustrating until the data foundation catches up. But these improvement can be part of an incremental process towards a data-driven organization.