The “Unstructured Information” Most Businesses Miss Out On

Businesses’ ability to process numbers in “well-behaved rows and columns” goes back 40 years, notes K. Ananth Krishnan, chief technology officer of Tata Consultancy Services, one of the largest companies in India. Figuring out how to mine and process the information in text, video, and audio is the new frontier.

K. Ananth Krishnan, chief technology officer of Tata Consultancy Services

Where do companies get their information? Information about competitors, about how customers think they’re doing, about whether a new product or service makes strategic sense?

Increasingly, data is coming at businesses in unstructured ways, says K. Ananth Krishnan, chief technology officer of Tata Consultancy Services. It’s coming from outside of companies, in the kinds of networking and SMS messaging habits that customers have. And it’s coming from unstructured sources inside companies, from in-house blogs to internal knowledge markets. That’s in addition to the structured data that companies are used to generating.

The challenge, Krishnan says, is to figure out how to process all of that unstructured text and video and audio and phone use that’s floating around outside of traditional databases. And then to figure out how to act on it.

Tata is in a unique position to think about this issue. It’s a large company, with over 198,500 IT consultants in 42 countries. It generated revenues of $8.2 billion in the fiscal year that just ended. The company was listed in Forbes’ sixth annual “Asia’s Fabulous 50 companies” and was named “Most Admired IT Company of the Year” by Bloomberg UTV in 2010.

Krishnan is in a unique position to think about this issue, too. He has been with TCS his entire career, since 1988. He has an M. Tech. in Computer Science and an M.Sc in Physics from the Indian Institute of Technology, Delhi.

Krishnan spoke with MIT Sloan Management Review editor-in-chief Michael S. Hopkins about how companies should be supplementing their structured data, when privacy concerns weigh in on information capture, and what computer programs can learn from the average five-year-old.

You’re based in India, and TCS is one of the major IT companies in Asia. From your perspective, how has the capture and use of information changed over the past couple of years? What are the trends?

I see two things changing. The first is the realization that there is more information outside of the structured rows and columns of databases that enterprises have traditionally looked to as their primary sources of information. The rise of unstructured information, of that part of the puzzle, is really huge in the last couple of years.

The second thing is that it’s becoming more and more evident to enterprises that the social web actually does make sense for businesses.

More information lies outside the enterprise than within.

Yes. There’s more unstructured data from outside the enterprise, from the social web. And there’s more unstructured data inside the enterprise. Both those things supplement the structured data that companies are used to generating. Capture mechanisms now have to worry about all three, which was not the case five years ago — even two years ago.

What brought on that change?

I think the big factor was the sheer volume of the social web. It finally got to people, and they finally started saying, “Maybe I should be listening to this.” The second factor is the sheer number of data types that people have to deal with. Data types from a computer science perspective have always been well behaved: numeric, alpha numeric, and so on. But now you have voice, you have video, you have combinations of structured information, combinations of unstructured information.

That really caused a realization that if we are only looking at what we have in our data warehouses, it’s not going to be enough for us to get the insights that we need. If you’re a retailer and you were not using all the information you could to judge your customers’ buying patterns, then the retailer across the street probably will, and they’ll steal your customers. That’s the realization, I think, that drove a lot of people to think that they should be capturing much, much more.

Why do you think that realization has taken hold and spread in Asia?

If I look at India, if I look at China, there’s the sheer number of computing devices, mobile phones, that people have in their hands. It’s a fairly simple decision to say that now that I have literally hundreds of millions of people with computing devices in their hands, I can reach out to them, I can capture information about their likes, their dislikes, and their locations that I could leverage in my business.

People talk about how Facebook has 500 million users, and that’s certainly an important part of the puzzle, but I think it’s more the combination of all of these things that is the causative factor: Facebook, the web overall, Google’s invention of the PageRank algorithm, the mobile phone, the ability for us to process the information.

If 600 million mobile phones in India had been sold 15 years ago, we would not have been able to process the information from them. If 500 million Facebook users had come online in 1985, it would have been nice, but we wouldn’t really have had the ability to analyze all the stuff that they’ve been doing. We can’t look at that information perfectly yet, but it’s better than ten years ago. Now, we can build the machines and the computing software to look at the information with massive amounts of data processing.

I’m comparatively naïve about computer science, so tell me: how far along do you think we are in our ability to process all the information that’s available out there? Can we only marginally do it? Five years from now, is it going to be radically different than it is today?

There are still loads of things that we can’t do. There is a whole aspect of computing which PhD students are working on, which is basically trying to understand text. Understand sentiment. A five-year-old child can say in 30 seconds whether Mom or Dad is angry, or happy, or whatever. Sense the mood in the room. A computer program still has a hard time figuring that out.

Our ability to process numbers in well-behaved rows and columns goes back 40 years. Our profession used to be called “data processing” in the 1950s and the ’60s. So that part we have it figured, although we can do a lot better.

But the rest of it, the analysis of text, the analysis of video, the analysis of audio — it works a lot better in James Bond movies. In real life, it is extremely hard from a fundamental computer science perspective to understand all that information. And this is true not just for companies but for governments and security agencies, too.

What do you think is keeping companies from advancing as quickly as they might along this path? Cost, I imagine, is part of it, but are there cultural habits that get in the way?

Yes, the big one that we have to bear in mind is that most companies are still grappling with efficiency and cost issues. Especially in recent years, the mindset has been to reduce cost, to do more with less. The ability for businesses and their IT folks to come in together and say, “Here is a large enough problem which is going to yield sufficient business benefit” is not yet mainstream.

The second big barrier is the technology itself. Most of the first companies which are bringing that scale of technical capability to bear on the analytics problem are in startup mode right now. There are two or three very exciting companies that we are tracking as part of our co-innovation network, but they’re all less than two years old.

And there is a little bit of a hesitation — maybe it’s a cultural habit, like you said — to buy or in some ways access this information. Should I intrude upon someone’s privacy? Is it okay for me?

I thought you were headed down a different road with that last point. You said the question is whether it’s ok to listen to what people say on the web, but I thought you were going to ask whether you can trust what it tells you.

Well, let’s talk about the use of social webs inside the enterprise. Here at TCS, we are having a lot of success in saying that if you’re dealing with a particular problem and you need help, you go into our social platform and you just ask. You type in a question saying, “This is a problem I’m having. Has anybody solved this before?” And you might get five responses in 30 seconds from people who have done exactly what you tried to do, and they have their solutions.

Of course, three responses might say one thing and two might say something totally different. So you still have to use the intuition and the judgment.

By the way, I find that Gen X and the Gen Y people are actually good at this. Better than older people like me. We might get stymied by what to do next, but they seem to be able to kind of figure out okay, I’m going to go with what these three people said.

We did a story for our winter 2010 issue called “How to Find Answers Within Your Company,” which also talked about the ways companies use internal knowledge markets. How new and developed is TCS’s use of internal social networking?

We are today probably one of the largest users of the social web inside the enterprise, and we have improved our ability to look at the structured and the unstructured opportunity. In the last three years we have really launched into the exploitation of the social web as a means for ideation, as a means of finding the expert, as a means of learning. We use the web to form groups to look at specific problems and tapping into a collective intelligence.

All those things supplement the way we look at our structured information, and they get some of these subjective insights into what we should be doing as a business.

For example, I have a blog inside the company, and I just finished writing a blog post which will go live tomorrow morning on the ideation process. There are a lot of things that I as the CTO of India’s largest software company should be looking at. Obviously, I don’t have the bandwidth to look at all of them. So I’m asking my readers to help me find out what am I missing. What are the three things they feel I should be paying attention to? Hopefully I will get a few hundred responses, and then I and my staff will go through and make sure that we pick the top three from there.

I do this quite often to supplement what I’m reading from all the other sources of information. The kind of insights that our business leaders might need for creating a new service offering or going after a new market or whatever, many of those get validated by this softer data.

Tell us about the kinds of opportunities you see for companies to use the social web or what you call “unstructured information.”

Let me use myself as an example. I patronize a lot of the airlines. Not just because they’re customers, but because I have to. Now, with a couple airlines, I am like a super platinum whatever flyer, and I get treated appropriately for that status. Which is nice, because when you fly a lot, it’s nice to be treated well.

On other airlines that I travel less frequently on, they look at their rows and columns, very correct data, and that data shows that Ananth Krishnan is a lesser mortal.

Just a guy.

Just a guy. Now, shouldn’t there be some way for them to figure out that this guy in 55E at the back of the plane is actually potentially a very valuable customer because he’s valuable to my competition? It could be something as simple as the check-in clerk Googling the name of the passenger and figuring out that maybe he or she should be nicer to this person, in some sense.

That’s a great example. Not knowing who the customer is but who the customer might become for you. Have you encountered companies that use something external to their own customer database to figure out that they might have different potential on their hands?

The travel guys and the credit card guys are beginning to collaborate on this. Some people might get invited to Bank X’s platinum credit card just because they are Airline Y’s platinum flyers. That’s a pretty explicit way of latching onto desirable customers.

But it’s done relatively mechanically. It’s not done on the fly like the way I’ve described it at the point of sale or the point of check-in. That might just be just a matter of time, though.

It does raise all kinds of privacy questions, though, doesn’t it?

There certainly is the question of whether you’re allowed to do it. In Europe, for example, data privacy and customer privacy rules are far more stringent than in most of the parts of the world. Even if your company wanted to, it cannot share information about its customers with an external agency.

The other question is whether people will sign up for that, to give their data to one company and tell them that it’s ok to share it with other companies. I might have a problem with that. Whether it’s a company or a government, it might be a problem from an individual’s perspective, even if there’s a chance the information I gave would deliver a better product or a better service experience.

Here’s an interesting recent example. One of my labs is doing an mKRISHI experiment with delivering personalized information to Indian farmers. We’re experimenting with between 10,000 and 50,000 farmers, depending on how many are using the system at any given time. The farmer has the option to volunteer personal information like, “This is the size of my farm. These are the crops that I grow.” The idea is that the more information you give us, the better advice we can connect you with to help you be more successful. But we’re finding that not everyone is so willing to share all their information. What we’re finding is that the highest usage is on the SMS text service. People send in questions like, “Should I plant tomorrow or the day after?” or “Is it going to rain?” That’s a skim-through service, simple information dissemination, and it doesn’t demand that the user tell people who he is. Which is a learning experience for us. We expected that people would volunteer more to get better advice, but they don’t.

Do you think people don’t understand the tradeoffs yet? Or that this is, as we talked about earlier, a generational thing?

Both, yes. The tradeoff boundaries might be set extremely differently for someone like me compared to a 20-year-old Indian student or American student or Chinese student. Or even a Swiss student, for that matter.

And of course, as a business you know that older people are just going to get older and younger people are just going to keep growing to become the dominant cohorts, so you know which way this thing is trending.

Sure. I think the Gen X and the Gen Y folks, when they are making the business decisions, might be more willing to just go ahead and try it.