Why Companies Have to Trade “Perfect Data” for “Fast Info”

Companies have been trained to think about data all wrong, say Attivio’s Ali Riaz and Sid Probstein. “Analytics don’t have to be based on super-precise data,” they say. “The report doesn’t have to be perfect. It needs to capture the behavior, not the totality of it.”

Reading Time: 15 min 


Like what you’re reading?
Join our community

5 free articles per month, $6.95/article thereafter, free newsletter.

$89 $45/Year

Unlimited digital content, quarterly magazine, free newsletter, entire archive.

Sign me up

Does this sound familiar?

Your company collects data. You want to act on it. First, though, you really, really want to make sure that data are accurate. So you focus on getting it right. Better to wait on a decision until you have the absolutely correct information than act based on partial information.

That might make sense, but it’s the wrong way to go, say the top two executives at Attivio Inc., a privately held enterprise software company based in Newton, Massachusetts. The problem with focusing on getting the numbers too right is that most companies sacrifice speed for accuracy.

Ali Riaz, Attivio CEO, and Sid Probstein, CTO, are “practically relatives” at this point, according to Riaz. “I think I saw his first child being born, the second child and the third child,” he says. They met when Probstein interviewed for and then initially “refused to work with” Riaz at FAST Search & Transfer, a company of which Riaz was then president and COO (it’s now owned by Microsoft Corp.).

Probstein “understood something I didn’t understand right away, that FAST, at the time, didn’t have its strategy right — I didn’t understand that because I’m kind of like a hopeless romantic,” says Riaz. “When I realized that he actually got it, he got that this company was not ready, I thought, ‘That’s a smart guy.’ I called him personally and begged him, and we’ve been together since.”

Riaz and Probstein spoke with MIT Sloan Management Review editor-in-chief Michael S. Hopkins about the stifling downside of the quest for perfect data, why “eventually consistent” is a concept every company should take to heart, and how to deal with the need for speed

Where do you think tech-driven information and data trends stand in terms of how companies understand them? How has the capture and use of information changed most in recent years?

Ali Riaz: Let me go back in history. I used to work at Novartis Pharmaceuticals, and one of the things that was really bothersome for me at the time was that we could never agree on the data. We got to the management team meetings, and one system would say we have 17,500 employees and another would say we have 17,300 employees. Or one system would say we have 400 patients enrolled on this trial, and another would say 800. These might not seem like big issues, but they ended up consuming a lot of our leadership time and causing frustration.

We never got to really be an intelligent company, in the sense that we were seeing the right things and being able to act and collaborate based on them. But this isn’t unique — I’m not throwing Novartis under the bus. I would say that this is a problem that most companies have had, and still have.

Sid Probstein: I think that’s exactly right. I worked in financial services, and 20 years ago the issues were all around the things Ali’s talking about. We couldn’t agree on how many units were sold, because there were 12 different products and 12 different systems storing the information on them. How could we get a unified view of our customers?

One of the first projects I worked on at a big financial services firm was to do the traditional Pareto breakdown, looking for the 20% of customers who were providing 80% of the revenue, to figure out if we could eliminate focus on some unprofitable customers. Classic modern business theory, right? It was a decade-long project just to unify the data.

But then the company grew. And part of the challenge of what makes it so difficult to achieve an intelligent enterprise is change. That financial services firm bought another company and then another ERP [enterprise resource planning] and another CRM [customer relationship management]. Resolving all of that becomes a huge challenge.

So what I think we’ve seen developing over the last 10 years is the value of what I’ll call interim steps. The idea is, look, let’s not try to move all the data together, let’s not worry too much about putting it together in one coherent way. Instead, let’s figure out what lives where as interim first step, so that when we perform an analysis we can know the provenance of data.

Let me make sure I understand the concerns about where the data came from, what you call the provenance of data.

Sid Probstein: Well, that’s definitely another thing I’d say has changed. People today are very concerned with provenance. Before, you used to argue about which report is right. Now, you want to know where that piece of data comes from.

People are focused on understanding if data are trustworthy. What assumptions might this source have made? For instance, it’s very common in a company that has newly acquired another company to trust their reporting less than its own. That’s a very natural, human effect. You think, “Well, that seems interesting, but I don’t know how they calculated revenue.”

The thing is, if two companies, before they even get into discussions about how their pieces fit together, start asking how did they collect the data and what led to the data, they’re probably going to convince themselves very quickly that it’s going to be hard to put this all into one view.

That’s why master data management, an interim set of representations, has emerged as so appealing. It’s all about how the intelligent enterprise responds to the need to move faster. It’s important to integrate and understand the data, but managers are accepting that they can start to do all that without necessarily having to push all the data into the same technology stack.

I think companies thought 10 or 15 years ago that the systems they were putting in place would deliver uniform, universally accessible, trustworthy, analyzable data. And yet, here they are all these years later, after significant investment, often feeling no better off. What’s your sense of what people expected back then and what they’re most or least frustrated about now?

Ali Riaz: First, I think we can’t ignore human nature in corporations. If I get data that says, “Ali, you did a really good job this month,” then I trust it. If the data says, “Ali, you did a bad job this month,” I may not trust it. I may question it; I may want to know more about it. People only select the information that supports their beliefs, so using dispassionate analytics is the only way to dispel this problem. The early transaction systems didn’t contain the “why” of information, just the “what”. It’s the more recent ability to merge all the sources that makes for better information and better decisions. Triangulating on a fact or an event validates it and also lets you discover what you might never have known by looking at all your data sources separately. Where we are now is that if I get information that says, “Ali, you did a good job on A, B and C, but you could have done a better job on X, Y and Z,” and if that information is complete, analyzed, and presented in a timely fashion, there’s not a lot of places I can hide.

And none of us expected two things: the amount of scale that we need, and the speed that we need. Companies like Comcast and Verizon have millions of clients, and every day, hundreds of clients move from them to competitors. There’s no point in finding out tomorrow why my customers left me yesterday, but it would be great to know who is about to leave me a week or two from now.

Sid Probstein: That’s a really key point. People thought they were going to fix reporting: Before, maybe it would take a week to run a report, but we didn’t know if the data were correct or not, so our focus was on getting the data accurate. Today, managers don’t just want the report to be accurate, they want it accurate and they want it every 10 minutes or in a dashboard that updates continuously. Or they want it plus a report analyzing the hundreds of millions of emails inside the company. The systems that have to start to address that kind of performance are not changing fast enough to keep up.

Even if I’m an old brick-and-mortar company, if I start up a website where I’m making sales; all of a sudden the tempo of my business has changed dramatically. I have a store that’s open 24/7. I collect information about what these people are doing on my site, but if I don’t crunch it and analyze it and come up with the best offer for people each time they arrive at the site, they’ll go to another website that does a better job, and they’ll do it for free since there’s no switching cost.

These are things that we didn’t even know to ask about 10 years ago.

So let’s look at where we are now. It sounds like you’re saying that even if you solve the challenge of making your data perfect, you might be doing it too slowly to act on. Should we be asking different questions of our data, and therefore of the tools we use to parse and analyze it?

Sid Probstein: Yes, you’re exactly right. One of the most important questions is whether we should even worry about whether this report is exactly right or not.

There’s a term called “eventually consistent” that grew up around a whole fleet of open-source-type technologies for crunching the huge amounts of data generated by website click-throughs. If you’re an e-commerce site, you want to understand the convergence of what the user is looking at and why he is clicking on it. Amazon.com, of course, is really good at this, asking, “For this customer at this very moment, what’s the best thing to show them?” They have high, high rates of success on recommendations, on product bundles, on follow-on advertising.

Amazon is good at this because they don’t worry about everybody. They develop a model where they’re eventually going to get a consistent model of the world, but at the moment they need to do it, they don’t care that they can’t roll it out for everyone. They’ve got hundreds of millions of clicks a day, and they figure, why don’t we just look at 20% of them? The key thing is to do it quickly and to make sure that whatever we conclude, there are many observations for it

This is when the term “analytics” becomes interesting. Analytics doesn’t have to be based on super-precise data. That doesn’t again mean wrong data, but it might mean some outcome that wins for the customer. If you profile a jazz CD that people didn’t know they wanted, and some people buy it, great. The fact that some of the 100,000 people that you showed it to didn’t buy that CD is irrelevant.

I think of that as an incredible innovation, to be able to say that the report doesn’t have to be perfect. It needs to capture the behavior, not the totality of it.

Ali Riaz: For this to actually work, we need a whole new philosophy around leadership, decision making and performance management. People spend a lot of time worrying, “Hey, did I earn my bonus? Was I at 103% of the target, or 97%?” That worrying takes a lot of energy. Those conversations take a lot of time.

Now, really, as a CEO of the company, 97% or 103%? Don’t I just want the employee to be happy? Personally, I don’t want a disgruntled employee, I want them to get the benefit of the doubt and go out and be happy and meet clients and be productive. But we are trained, we have this in our DNA, that we fight about 103 versus 97. Our boards want to know if it’s 103 or 97. Our management wants to know. Our line managers want to know. That’s just the way this tail is curled.

But for us to live with the realities of information growing more and more and speed getting faster and faster, we need a new way of thinking not about having precision but about having a good understanding. And being able to live with that.

This is really interesting. I guess the obvious question at this point is how to bridge these gaps.

Sid Probstein: One thing I’m hopeful about is that I think managers get that they need to understand the frame a lot better. Say you’re a brand manager and one item is selling well and then slows down. You need to consider if that’s because you stopped promoting it, or because a competitor has a better product, or because users’ social media comments and blog entries that cover this stuff are negative.

Yes, the sales figures are relevant. Yes, a breakdown of features is relevant. But understanding the outside context is huge, too. What do the customers think? What’s the trend in the marketplace? What’s the buzz?

You’re saying there is an understanding among the executives you have contact with of those distinctions?

Sid Probstein: Absolutely. Ten years ago, it would be perfectly normal to participate in a meeting where nobody had done any — and I’ll use the term directly — “Googling” of the larger environment. They wouldn’t have looked up news stories, or looked up trends or tracked down what people were saying about it. Now I think it’s rare to have a get-together where people haven’t educated themselves on the larger frame. And that’s a significant change.

Ali, can you say more about the leadership challenge that you began to describe, which is at odds with the very metrically driven way that people evaluate performance and lead organizations? The 97% versus the 103% is driven by trying to parse distinctions, which, in the end, don’t really matter to a company’s overall thriving and success.

Ali Riaz: Generally speaking, most corporations are inefficient in a lot of ways, given the human factor, the data factor, the change factor. There are a lot of factors involved. But in order to abandon that, managers have to come to believe that having a range of information is better than having one piece of information.

Attivio’s chairman and main investor, Per-Olof Söderberg, is an instigator for this type of dialogue. There have been times when one person or one team didn’t reach their goals, and we talked to him and we said, “They didn’t reach that quantitative goal, but qualitatively speaking, they’ve done a tremendous job.” They got their full bonus. So we are living it and breathing it today. But it has to come from the top.

I would love to see MBA programs that talk about what to do when two people come with two different sets of data for the same issue. I don’t think we have baked the reality into our education programs that this is going to happen, that you may never get the exact number right, and that as a leader, as a manager, as somebody who actually has to deal with this process, you have to figure out how to still move forward.

I want to play devil’s advocate for a minute. If I’m an executive, maybe I know how to capture customer feedback and external information about the competitive landscape. I get all this stuff. But it was hard enough for me to take my apples-to-apples report and make meaningful choices based on it. How the heck am I going to take all this other stuff and actually put it to meaningful use?

Sid Probstein: Right. That is exactly the role of leadership, which is to deal with uncertainty in this information age. Maybe the question is whether you should be so concerned with comparing apples to apples. Maybe the better question should be, What is the result? If I can produce an analytic that ignores 10% of my customer data but increases my conversion rate 2%, should I focus on fixing the problem so that I include that extra 10% of customer data, or should I just try to get at extra 2%?

At the end of the day, and you cite this in “The New Intelligent Enterprise Survey,” innovation is a key driver, dealing with uncertainty in innovative ways. You don’t throw out the analytic that’s producing a 2% improvement just because it’s not 100% thorough.

And what happens — if I can just follow up on that — when you talk to a company about the kind of approach you’re describing, and they have previously been focused on trying to get their data right? What’s their response?

Ali Riaz: We haven’t started yet to go to them and say that they should not compare apples to apples. What we have done is to say, “Sure, look at how many apples you have, but also provide the context around the apples.” So you may have grown 10%, but every competitor grew 30%. That’s important context. Or your apples have a shelf life of five days, but everybody else’s have a shelf life of 15 days. Or the clients you’re acquiring have a drop-off rate of 30%, while other companies have 2%.

Setting quantitative goals and measuring quantitative goals is human nature. I don’t think the capitalistic world would function without goals. I couldn’t function without them; I have personal goals that are quantitative, and I monitor them. That’s just the way we work. But providing more context, providing more sources of intelligence so that not only are you looking at the apples, is better. An Atlanta team delivering 97 without any local support offices may be fantastic compared to a Boston team delivering 103 with headquarters right behind it.

You’ve got to get the data right, and not just data, but the range of data, and then you have to have context for what the data mean. Then you have to have leadership and business processes that allow for a dialogue. Do all that and you’ll actually be making intelligent decisions and not political CYA, all those things that happen every day in organizations and governments. Having a wider set of data and content, structured and unstructured, will allow you to learn to paint with colors that are new and old.


Reprint #:


More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.

Comment (1)
Bryant Avey
This was a great conversation. We hit up against these issues all the time when building data warehouses and business intelligence solutions for our clients. 

We often find that it just takes to much time and effort to dig through the final 10% -20% of "dirty" data. Great strategic decisions can be made with 80%-90% "clean" data.

If there's a good business reason to get parts of the data perfect, then make every effort to clean it. Otherwise decide to deal with it when or if it becomes important.