Data as a Resource: Properties, Implications, and Prescriptions

Reading Time: 37 min 

Topics

Permissions and PDF Download

In recent years, many studies have examined how leading corporations are better utilizing information and knowledge.1 Less noticed has been the management of data, “the sludge of the information age — stuff that no one has yet thought very much about.”2 Yet data are ubiquitous. Almost every activity in which an enterprise engages requires data.

Data are used in, and created by, all daily operations, from serving a customer, to manufacturing a product, to tracking inventory. Data support managerial and professional work. Data are the critical inputs into almost all decisions, at all levels of an enterprise. Through data, managers learn about an organization’s human and financial resources. Data may be combined in almost unlimited ways in the search for new opportunities, market niches, process improvements, and innovative products and services.

Because data implicitly define common terms like “customer,” they contribute to an organization’s culture. They “fill the white space” in the organization chart. Enterprises strive to convert tacit knowledge into data. For example, a salesperson may have a warm personal relationship with an important customer. But, for the enterprise as a whole to serve that customer, certain aspects of the relationship must be expressed in data.3

Not surprisingly, most companies readily admit that they should manage data as business resources, just as they manage human and financial resources. They just as readily admit that they do not do so. Few companies even know what data they have; people cannot gain access to needed data; the quality of data is low;4 and data are not used effectively (see the sidebar).

Data Issues Facing a Typical Company »

To manage any resource properly, companies must understand the roles it plays, its properties, the opportunities it offers, and the steps they must take to exploit those opportunities. But while data present enormous possibilities, they also present special challenges. For example, unlike other resources, data can be readily copied, shared among many people using information technology (IT), and then used in dozens of different ways. But the efficient dissemination of data rarely occurs. Instead, most individuals and business units, consciously or not, hoard data, leading to brutal political battles over ownership. Even in situations where the data are shared, individuals and units make and modify their own copies. The resulting inconsistencies defeat the purpose of sharing the data in the first place. Furthermore, it is difficult for organizations to enforce confidentiality and other policies regarding the use of data.

The enterprise seeking to manage data effectively must understand how data differ from other resources and what those differences imply. This article aims to provide that understanding. We first discuss the fundamental properties of data and consider how data differ from other resources. Next we explore the challenges and opportunities inherent in managing data. We conclude by prescribing actions to help companies surmount the challenges and successfully pursue the opportunities.

Our discussion is limited to data (not information and knowledge) because we think data deserve consideration in their own right. Furthermore, a consensus has not yet emerged on the definitions of information and knowledge and on the distinctions between those concepts and data; we do not wish to join the debate here. At the same time, we believe that many of our prescriptions for managing data complement recommendations, based on observation of best practices, for managing information and knowledge.5 Indeed, a company that cannot manage its data effectively is unlikely to do an excellent job in managing its information and knowledge.

Properties of Data as a Resource

Many authors have noted the distinctions between data and other organizational resources. In this section, we synthesize their work and our own observations to provide a framework for comparing data (or information and knowledge) with other resources (see Table 1).6

Before proceeding, we want to clarify our use of the terms “resource” and “data.” The American Heritage Dictionary7 provides two definitions of “resource,”8 both of which are pertinent here: “an available supply that can be drawn upon when needed” and “a means that can be used to advantage.” As these definitions suggest, a resource is both necessary for an enterprise and a potential source of yet unrealized benefits. With these definitions in mind, we can list several categories of resources used by modern enterprises. There are the traditional resources — financial, human, plant and equipment, raw materials, and energy — and the so-called information age resources — data, information, and knowledge.

Philosophers, computer scientists, and information systems specialists have all pondered the notion of data.9 As used here, “data” (or a data collection or data set) consist of two interrelated components: data models and data values.10 “Data models” are the definitions of entities, their attributes, and the relationships among them that enterprises use to structure their view of the real world. For example, you are an entity. Your employer is interested in attributes such as your name, ID, date of birth, and so on. As an employee, you have relationships, such as “report manager” and “subordinates,” with other employees. “Data values” are the specific realizations of an attribute or relation of the data model for particular entities. For example, the “123-45-6” in “Employee ID = 123-45-6” is a data value. Enterprises model the same entities (and the world) differently. This is quite natural — the Internal Revenue Service is interested in you as a “Taxpayer” while your employer is interested in you as an “Employee.” And “Highest Degree Earned” may be part of the Employee data model, but not the Taxpayer data model. Other attributes would be included in the Taxpayer model but not the Employee model.

While data (i.e., the models and values) are abstract, data records are concrete. They are the physical manifestations of data stored in paper files, spreadsheets, and databases, and presented to users in ways that make them easy to store and use. Now that we have defined our basic terms, we will look at the distinctive properties of data and data records and consider how these compare with the properties of the traditional resources.

Intangibility

There is a universal agreement in the literature that the most striking characteristic distinguishing data from traditional resources is their intangibility. In the past, the intangibility of data led some researchers to object to considering data (as well as information and knowledge) as resources. That viewpoint has died away as evidence has grown of the ever-increasing role that these intangible resources play in all businesses, from traditional manufacturing to financial services, insurance, and other information-intensive industries.

If “intangibility” is defined as “incapability of being defined by the senses,”11 then it is important to distinguish between data per se and the representation of data through a particular medium. As an abstract concept, data are clearly intangible. But data records are perfectly tangible, be they on paper, microfilm, or computer-related media (for example, a magnetic tape or a disk). In some cases, we need special equipment such as a computer to ascertain the presence of data records, but that need does not make them intangible.

According to the definition of “intangibility” quoted above, financial resources per se (whose realizations are either currency or data records) should also be considered intangible. We are reluctant to do so, however, because one definition of tangibility is the “capability of being valued monetarily.”12

Consumability

A resource is consumable if usage diminishes the amount of the resource available for future use. Money, raw materials, and energy are examples of consumable resources. Clearly, neither data per se nor data records are consumable.

Some authors have gone so far as to claim that data’s nonconsumability distinguishes them from the other, more traditional resources. But human resources and plant and equipment are also nonconsumable according to the definition given; indeed, an assignment of a worker or a machine to execute a specific task does not preclude reassignment after the task’s completion. As for the question of “wear and tear,” we will discuss it later as “depreciability.”

Shareability

By resource shareability, we mean the possibility that several users can simultaneously use the same unit of the resource. None of the traditional resources is shareable according to this definition. For example, two users cannot share the same dollar. They can share a building, but not the same part of the building. But data are shareable in two ways. First, the same data may have multiple representations in different sets of records, each of which can be used simultaneously by different users. That makes data per se a shareable resource. Second, modern database management systems allow nearly simultaneous, multiuser access to the same data records. However, that shareability of data records may be hampered by the medium’s limitations (for example, paper records are not available for simultaneous usage). Shareability can also be intentionally constrained for security reasons via encryption or by limiting the access to data records via passwords. Obviously, shareability of data implies both opportunities and perils that are not pertinent for the management of the traditional resources.

Copyability

Another characteristic of data is their ability to be copied; one can create an identical unit of the resource in question at a fraction of the cost of the original. The reduced-cost requirement of the definition is critical, for without it we would have been forced to conclude that equipment, raw materials, and energy are copyable resources as well. It is, of course, the data records rather than data themselves that can actually be copied.

Since computer records are both nonconsumable and shareable, there is no theoretical need to copy them. Moreover, extra copies not only consume the recording medium but, more important, create the problem of maintaining consistency. On the other hand, there are practical reasons for copying. First, it may allow users to work with data in more convenient environments. Second, it may allow users control of their own copy of the data, an important political consideration. Third, copying may mitigate the fragility of data, that is, the possibility that data may be inadvertently destroyed (see below). Determining the desired degree of redundancy is an important issue in managing data as a resource.

Transportability

As important as the ability to be copied is, it is the ability to transport data over large distances almost instantaneously that has really ushered in the information age. The ability to copy data locally existed for some time before progress in telecommunications technology made data transmission between distant locations possible. (To be precise, electronic data transmission is not actually transportation of data records, but rather the creation of copies at the destination point.)

The efficiency of modern telecommunications is not limited to the speed of transmission; quality and cost-effectiveness are almost equally impressive. With the possible exception of electricity, no other resource can be transported with the ease and efficiency of electronically stored data. The other possible exception is money, which can also be transmitted electronically through its data proxy!

Nonfungibility

“Fungibility” means that one unit of the resource in question can be substituted with another unit of the same resource, if the latter is available. Money, raw materials, and energy are fungible. Human resources and plant and equipment are also fungible, although their substitution may be costly and inconvenient. But data units, that is, individual data items, are unique; we cannot substitute a person’s date of birth with another data item (for example, name or sex) about that person or about somebody else. While we can sometimes infer the value of one data item from that of another (for example, age from date of birth), such situations are exceptional. As far as data records are concerned, we can obviously substitute one record with another if they both represent the same data item.

The nonfungibility of data raises special management problems. For example, as a defense against possible defects in units of other resources, managers can choose to keep a larger supply of the resource in question. But for data, this strategy is meaningless; a defective data item cannot be replaced with another data item. And, of course, keeping extra copies of data records will not help if the copies are made from an erroneous original or have become outdated.

Fragility

By fragility of a resource, we mean the degree or ease with which it can be inadvertently destroyed or lost in routine use. Traditional resources do not usually qualify as fragile. The situation with data is altogether different. Though paper records can be inadvertently lost or destroyed, it is the remarkable ease with which computer-stored data records can be (and all too frequently are) unintentionally overwritten or wiped out altogether that makes us consider data records fragile. Also, computerized data can be inadvertently destroyed when new systems replace old and can be easily lost among high volumes of other data. Of course, computer-stored data can be protected from overwriting and are routinely backed up or copied in most commercial environments. Still, many users lack the skills needed to work confidently with such protection mechanisms. Given the nonfungibility of data, it is not surprising that many users tend to err on the side of caution. In fact, experience shows that unwarranted proliferation of data copies is often caused by no reason other than concern for data’s fragility.

Versatility

A “versatile” resource is one that can be used for a variety of purposes. In the abstract, each resource category is versatile to an almost unlimited degree (for example, think of the variety of existing raw materials). But a particular raw material for a specific manufacturing process may have limited alternative uses beyond that process. At the other extreme is money, which has the widest range of possible use. Data, in our view, occupy a position between the two extremes.

The versatility of data, along with their other characteristics, provides a company with valuable sources of new business and improvement opportunities. Targeted (i.e., data-driven) marketing is one example. The negative side of data versatility is the possibility of misuse. One kind of misuse occurs when data, legitimately collected for one purpose, are used for another, illegitimate one. Data about a person’s age and health, legitimately collected for medical purposes, should not bear on that individual’s opportunities for promotion at work. Ambiguities in data semantics compound this problem. For example, a salesperson may view a “sale” as complete when he or she and the customer have verbally agreed to the deal. But the legal department doesn’t view the sale as complete until a contract is signed; the production department, until the product is delivered; and the finance department, until payment is received. Loebl gives a particularly poignant series of examples involving energy data and concludes, “The single most significant source of error in data analysis is misapplication of data that would be reasonably accurate in the right context.”13 Misunderstandings are further exacerbated because explanations of data semantics (i.e., data dictionaries) are not kept current and may be difficult to obtain. Finally, many users, even sophisticated ones, simply ignore them.

It is easy to underestimate the practical importance of misinterpreted data semantics. Suboptimal, even grossly incorrect, decisions result. And misinterpretation of data semantics can lead to political conflicts. For example, the various interpretations of “sale” are all correct from the (limited) perspectives of functional organizations. We should expect them to vigorously defend their interpretations. And, collectively, they will stymie the enterprise that needs to “increase sales.”

Valuation

By “valuation” of a resource, we mean expressing its value in monetary terms. For traditional resources, either market forces or well-established accounting practices establish value. Though some data sets are sold on the open market (for example, historical performance of financial markets, point-of-sale data, many kinds of customer lists), most data sets are not for sale, and their valuation poses difficult theoretical and practical problems.

Decades of research on these problems have not yet produced clear results14 for several reasons: the failure to separate the information content (i.e., data) from the information technology; the predilection to analyze data as a fungible commodity or resource; the absence of intrinsic values in data (i.e., data’s value depends on specific applications); and the versatility of data usage.15

Data valuation raises several additional concerns. First, it is almost always easier to estimate the costs of data than to estimate the dollar value of their benefits, which can lead to unwise decisions not to obtain data whose utility is uncertain.16 Second, issues related to internal transfer prices for data can be difficult to resolve. For example, some organizations charge users to access data sets. This practice may discourage usage — a result that may be contrary to management’s intent. On the other hand, without an indication of users’ willingness to pay, it is difficult to weed out those data sets that are of no utility whatsoever.

Depreciability

“Depreciation” is defined as “a decrease or loss in value because of wear, age, or other cause” (“other cause” does not include a decrease in the quantity of the resource).17 Buildings (but not land), equipment, and most raw materials depreciate; energy usually does not. Depreciability of the financial and human resources are ambiguous: for the former, it is complicated by the possibility of inflation and deflation; for the latter, the difficulty stems from varying relationships between age and job performance and from differences among individual workers.

The value of data does not usually diminish because of use (we disregard as negligible the wear on the recording medium and computer used). But there are exceptions. The more people use a stock tip, the less the value of that information to each person. Similarly, enterprises do not sell or share their data because doing so may diminish the value of the data. The passage of time makes a difference in many cases. If only current data values (for example, current salary) are of interest, they must be updated as aspects of the real world they describe change. If data are time-stamped, then they typically become less valuable as time passes (for example, an employee’s 1998 salary will probably be much less valuable in ten years). But again, there are exceptions. Data-mining techniques can make excellent use of detailed historical records, so older data can be quite valuable. In most cases, then, data do not depreciate with use but they do with age; there are so many exceptions, however, that a general statement is probably not justified.18

Source

Typically, traditional resources originate outside the enterprise using them, with the exception of financial resources, which can be generated externally and internally. The same dichotomy of sources exists for data.

Unlike many other resources, data are generated by a tremendous number of sources. Every transaction with suppliers and customers, most internal operations, and managerial and professional work all supply data. Although management approaches such as the customer-supplier model are applicable, the sheer diversity of data sources adds enormous complexity.19

The original sources of many data sets are undocumented or even unknown. Typically, these data sets can neither be used with confidence nor improved. The Internet is exacerbating the problem. Of course, we do not always know the source of other resources — for example, of a gallon of gasoline. But agreed-on standards help guarantee that uniformity of content and gallons of gasoline are fungible. Interesting data, on the other hand, are new or different, and the data items themselves are not fungible, thwarting standardization.

Renewability

Whenever pertinent features of the real world change, data value changes and/or new data are created. New data result from everyday business, often at astounding rates. This property of data, which we call “renewability,” does not really apply to other resources, with the possible exception of solar power. Other resources can, of course, be renewed. But the spontaneous nature, the rate, and the degree of data renewal are far greater for data than for any other resource.

The situation is a bit more involved for data records. In most cases, it takes time and effort for a change to be reflected in the appropriate data records. Thus a person’s address changes the day that he or she moves, but it is some time until all databases are updated. IT can reduce the lag time in many situations, but lag time cannot be eliminated.

Naturally, a manager wants to use the most current data. Renewability affects this desire in two ways. First, ensuring that data are current is an important task; business processes that capture change must be reliable. Second, unless the mechanisms to ensure currentness are synchronized, inconsistencies are sure to arise in redundant databases. Decision-making meetings can then dissolve into heated exchanges about whose data are accurate.

Storage

Data, unlike other resources, can be stored on computers.20 The exception is financial resources, which can be stored through their data surrogate. This property contributes to other properties, such as copyability, shareability, and transportability. A compact means of storage should make data easier to manage. The cost of storing data is low (and getting lower) compared with the cost of storing other resources. On the other hand, cheap storage may contribute to decisions to save everything, including data that are no longer useful. An unintended consequence is that useful data may be harder to find. Cheap storage may also contribute to independent decisions by multiple organizations within an enterprise to store their own copies, leading to unnecessary data redundancy.

Implications for Management

Management of any resource seeks to achieve the following goals: to possess a sufficient but not excessive supply of the resource, based on user needs and usage patterns; to provide legitimate users with timely and efficient access to the resource; to protect the resource from unplanned destruction and unauthorized access and use; to maintain and improve the quality of the resource; and to promote effective and efficient usage of the resource for the maximum benefit of the enterprise.

For each goal, we consider the managerial implications in light of the fundamental properties of data discussed above.

Supply Management

The basic issues in managing supply involve identifying the users of the resource, their needs, and their usage patterns and making arrangements for obtaining the resource. For traditional resources such as raw materials, new technologies have enabled the application of inventory-management approaches such as just-in-time. Ronen and Spiegler have argued that these approaches can be applied to managing data inventory as well.21 We agree with their argument against generating unnecessary data but caution that managers should not underestimate the particular problems that arise in data supply management.

First of all, the nonfungibility of data implies that the supply challenge is not to determine the right amount of identical units, but rather the scope and relevancy of the data. The key question for managers is not how many units of data are needed but which data are needed.

Second, while the nonconsumability of data eliminates one concern of supply management for traditional resources (not having enough), it creates the all-too-familiar problem of data oversupply. Storing unnecessary data is expensive, not so much because it wastes the storage medium, but because it diverts management attention and makes needed data more difficult to find. Just getting reliable information about the various data stored throughout a large organization is a daunting task at best.

Third, identifying users and understanding their needs is much more difficult for data than for other resources. In addition, the data requirements of different users (e.g., different semantical interpretations of seemingly identical terms) often have to be reconciled — a difficult problem not encountered with traditional resources.

Fourth, some requests for data are erratic and unpredictable. The nonconsumability of data does not eliminate the need to know usage patterns, which provide timely updates of data values to guarantee the currentness of data.

Fifth, as we mentioned earlier, both the cost and value of data are poorly understood, making it difficult to ascertain the loss caused by the absence of data versus the cost of keeping an excessive data supply. In addition, for traditional resources, extra expense and inconvenience notwithstanding, an unexpected surge in demand can usually be met (money borrowed, temporary workers hired, and so on). This is usually not the case with data.

Finally, traditional resources are usually acquired through, or under the control of, a single organizational entity responsible for acquiring the resource for the entire enterprise. In contrast, data are usually acquired or developed by individual organizational units to satisfy their own needs, with little or no centralized control, a task made easier by the development of decentralized and mobile computing. Finally, one of the most important issues in managing a resource supply is the selection of suppliers. For many traditional resources, listings of available alternatives and selection criteria (for example, price, quality) are available or can be easily obtained. This is seldom true for data. Fortunately, the general principles of customer-supplier management can be successfully applied to data suppliers.22

Access

Assuming that the enterprise has the right resources, individual users still must be able to access them. While we appreciate the difficulties of getting the people and equipment in place to shovel the parking lot of a warehouse after an unexpected snowstorm, it is fair to say that issues of access to data are far more complex than for other resources. Issues range from the architectures and technical means to store and access data, to users being able to find the data they need, to data sharing.

In most cases, only electronically stored data can satisfy the access requirements of the modern enterprise. Given the remarkable progress in speed, accuracy, and affordability of optical readers, it is perplexing that large quantities of data are still stored as paper records. The instantaneous transportability of computer-stored data records has contributed to the concentration of important organizational data in large mainframe databases accessible to users from remote terminals. For a while, this architecture seemed to be the technical solution to the data-access problem, and it is still prevalent in many companies. But it is no longer the only (or even the most desirable) means to organize data. The rapid proliferation of personal computers and new ways to network have led, for example, to client-server architectures and distributed databases. It seems that almost as soon as some data architecture achieves wide recognition and acceptance, technological progress and new user demands render it outdated.

Several lessons emerge from this remarkable transformation. Although computers permit instantaneous access to electronically stored data, most large enterprises find themselves with a bewildering variety of data residing in databases developed for individual business tasks with little coordination in hardware, software, and data-modeling approaches. Providing efficient access to data in these environments is difficult. Indeed, many users do not even know where to look for the data they need.

Thus the first lesson is that unless the data resource is developed (or, more realistically, redeveloped) as an enterprisewide resource, no technological wizardry will guarantee efficient access to the organization’s data. Second, speed of access to data has proved to be only one feature sought by users. Other criteria, especially control and flexibility, are important as well. This has led companies to store data used for operational purposes in transactional systems and data for decision-support purposes in data warehouses and data marts. Companies should expect this planned segmentation to continue. And, of course, both the number of users and their data requirements will continue to grow rapidly. Third, enterprises should expect continuing improvements in technological means for data storage and transmission and in user interfaces. Periodically, the weight of the accumulated changes will lead to major changes in the way data are organized and accessed.23

Data sharing raises additional concerns. Theoretically, a combination of data properties, including shareability, nonconsumability, and instantaneous transportability, should allow for any number of harmonious sharing arrangements. In practice, of course, data sharing has proven difficult. The technical challenges are significant, although technical solutions are available.24 Political issues are even knottier. Typically, data are collected and used by organizational units to perform their particular tasks without regard for the needs of other units or those of the enterprise as a whole. Possession of data conveys power and influence.25 So what is the motivation to let others know of their existence, never mind to share them?

Security

Concerns for the security of any resource involve two principal issues: guarding the resource against unplanned destruction and preventing unauthorized users from accessing it. Data introduce other issues. First, unauthorized access to important data may have greater consequences than unauthorized access to any other resource, even money. Second, the fragility and nonfungibility of data make the security problem particularly acute. Third, the nonconsumability of data makes it less likely for the enterprise to discover unauthorized usage through simply observing that the quantity of the data is diminished. The fact that usage of electronically stored data does not require physical proximity exacerbates the situation. On the positive side, the copyability of data suggests a straightforward (at least, in principle) strategy for guarding them against possible destruction by periodically making backups. Moreover, a number of methods exist for securing data, including data encryption, virus detection software, and access control software. Yet despite the availability of these tools and the obvious importance of security, a recent study revealed a troubling lack of appreciation of, and effort devoted to, security in many organizations.26

Managers must also consider the issues of confidentiality and privacy. “Confidentiality” means that some data must be held privileged or otherwise controlled in their access, use, and dissemination. “Privacy” means that individuals and organizations have rights to control the collection, storage, and dissemination of data about them. Obviously, data versatility threatens privacy: data collected with no privacy objections for one purpose may well prove unacceptable for another. A tension exists between the promise of data mining and the search for heretofore undiscovered insights into consumers and their expectations of privacy. While information about policies and practices in corporations pertaining to privacy is available, by and large, issues related to privacy and confidentiality are unsettled.27

Quality

For almost all enterprises, quality has become a battle cry. It did not take early programmers long to realize the paramount importance of data quality for successful data processing. They even coined the well-known acronym GIGO (garbage in, garbage out) to emphasize the point. Concern for the quality of data in large databases, in both the private and public sectors, has arisen only recently, however. Recognition of the importance of quality data is growing, as evidenced by the stream of reports in the general28 and trade29 press.

Understanding the nature of data supports the effective management of data quality. First, data quality encompasses data models, data values, and data records.30 Even seemingly mundane data-modeling decisions can have enormous consequences — witness the trouble and expense of the “Year 2000 problem,” which was caused by decisions to save a few bytes of storage and which many predicted.31 Second, data are abstract, so we cannot ascertain their quality through direct measurement (as, say, we can measure the chemical composition of a raw material). Usually, we must compare the data to real-life counterparts, which can be expensive or, for some historical data, impossible. Third, because of the nonfungibility of data, we do not strive for uniformity. So standards are difficult to apply. And a defective data item cannot be simply replaced by another, flawless one. Fourth, because of the shareability, copyability, and transportability of data, faulty items can reach a great many users almost instantaneously. Indeed, bad data are like a virus. There is no way of telling where they will turn up or what impact they will have. Fifth, in contrast to other resources, quality levels are usually unknown to users. Finally, the sheer volume of data typically created and stored exacerbates the problems of managing data quality.

Fortunately, solutions to many of these problems have been discovered, and many companies are improving the quality of their data.32 The most successful efforts focus not on finding and detecting errors but on discovering and remedying their root causes.33 Data quality programs are not easy to implement, but those who have done so report excellent results: reduced costs, improved customer satisfaction, and more confident decision making.

Usage Management

Using any resource in the most beneficial manner is always a challenge. For traditional resources, the principal issue is resource allocation among various organizational units that submit requests for a limited supply of the resource. A classic example is the request for money or budget. The challenge of optimizing the use of data is quite different. First, many enterprises do not use the data that are readily available effectively. Most enterprises use data reasonably well in operations, but not in planning and decision making. Indeed, many organizations admit that they are data rich and information poor.

The issue is not new.34 The past decade has seen advances: the creation of new, data-driven businesses; electronic data interchange (EDI), which has revolutionized ordering, invoicing, and billing; and data-driven marketing.35 Only a minority of enterprises have used these tools to their advantage. Even those that have been successful have not exploited even a fraction of the potential of their data.

The issue of usage becomes even more important at the strategic level. Fundamental questions (What data will we need to execute a given strategy? How can we exploit data to create new opportunities? How will the availability of new types of data through the Internet affect our strategy?) rarely come up. Yet it is likely that the judicious exploitation of data will be even more important to success in the future, for two key reasons. First, managing traditional resources is based on data — thus making data the meta-resource. Second, competition during the previous decades has leveled the field in terms of exploiting the traditional resources, leaving data, the relative newcomer, as the potentially most promising source for gaining competitive advantage.

Ultimately, the language that all managers speak is money. The difficulties in valuing data underlie these various issues of data usage.

Underlying Issues

Two more observations are in order. First, organizational issues contribute to many of the problems we have discussed. Ownership and accountability for data are unresolved issues in most enterprises;36 political battles for control of data and information are among the most brutal we have witnessed.37 Furthermore, the appropriate managerial infrastructure has not yet been determined.38 The modern hierarchical form may not be suited for the information age. Some historians argue that middle management arose to fill an information-processing need. The penetration of IT, reengineering, and other movements has decimated, but not eliminated, middle management. So while the hierarchical form has been assaulted, it has not been replaced. The hierarchical form may have untapped strength or an acceptable alternative to it may yet be discovered.

Second, no other resource is experiencing such explosive changes as data, as far as its growing importance and the technological means by which it is acquired, stored, transported, and used. This state of almost continuous change makes near-term resolution unlikely.

Prescriptions

The challenges we have described have few simple solutions. Many are interrelated and involve compromise. The following prescriptions will not solve all data problems, but they will help a company make a solid beginning.

Institute a focused, internal data quality program.

As we noted previously, data quality involves many considerations. But two are straightforward: ensuring that data models are clearly defined and that data values are accurate. The creators of data models and data values should be held accountable for clear definition and accuracy. Managers should equip data creators with the tools of quality management, including customer needs analysis, measurement, quality control, and root cause analysis and demand improvements. Benefits accrue to both creators and users. Users benefit because the data they use are easier to understand and interpret, and they can use it with confidence. Creators benefit because the expense they incur to answer questions and make corrections is reduced. Both benefit because users, especially those outside the enterprise, are more satisfied.

Institute a data supplier management program.

Such programs are similar to internal data quality programs but are directed at data suppliers from outside the enterprise. These programs have proven to be enormously effective in manufacturing; similar programs, suitably adopted, pay rich dividends for data at low cost. There are many ways to define and implement supplier data programs. All focus, at least to some degree, on data quality, so this prescription is closely related to the first.

Hone your data needs.

Most enterprises have far more data than they can possibly use; yet, at the same time, they do not have the data they really need. In many cases, the failure to clearly define how the data will be used lies at the root of the problem. It is simply easier to collect data. The result is too much data of questionable value. Our prescription is simple: define in detail the most important uses of data, translate those uses into data requirements, communicate those requirements to the creators and suppliers of data, and eliminate data that do not meet requirements.

Identify and manage the most critical information chains.

“Information chains” are cross-functional business processes that manipulate raw data (obtained from external sources or created internally), package them into higher-value data or information, make them available to downstream customers, and use them to create business value. In most enterprises, the individual functions are usually managed and conducted fairly well. But the hand-offs between functions are poorly managed, and results suffer. The most important information chains should be identified and the techniques of process management, adopted for data, applied. The first step is to define cross-functional management accountability.

Recognize the proper role of technology.

IT has been quite effective in enabling reasonably well-established and -managed information chains to perform faster and cheaper, with greater capacity. But technology alone, or worse, technology with ineffective data management, is not a solution.39 Indeed, an overreliance on IT appears to exacerbate problems. Enterprises must first manage information chains and put them into reasonable working order, before applying the newest technology. For years, quality gurus have advised against automating ineffective factories. Our prescription simply extends that time-tested maxim to data.

Develop, maintain, and make widely available the inventory of data resources.

Most enterprises have no idea what data they have, which data are most critical, the sources of critical data, and the degree of redundancy. Information chains are the sources of much data and may be the most important component of the inventory. Developing a complete inventory may be an enormous task and not worth the effort. But the enterprise can at least develop, keep current, and publish an inventory of the most critical data and the sources of those data. It is appropriate to begin this task by first implementing a process that captures new data and data sources. Once that process is in place, the task of identifying and adding existing data (and sources) can begin.

Specify the terms and conditions under which one organizational unit may have access to another’s data.

Management exhortations to the contrary, most organizations and individuals do not readily share data. Naturally, there are many circumstances under which data should not be shared. But the turf wars that ensue must be curtailed. The terms and conditions specified must include the allowed use of the data (for example, can the accessing organization resell the data?), a fair transfer price, the technical means of access, quality assurance, and a means of resolving conflicts.

Recognize and avoid political battles that cannot be won.

An example is the seemingly innocuous task of specifying a definition, to be used throughout the enterprise, of “customer.” The definition may be desired, for example, to make the enterprise easier to do business with or to identify opportunities to cross-sell products and services. Unfortunately (or fortunately, depending on one’s perspective), each organizational unit has slightly different relationships with customers. The accumulated experience and knowledge for conducting business with customers is reflected in each unit’s data model. And that data model has indeed become each unit’s formal definition of its customers. The task of developing a common definition, even if all units agree that it is a good idea, is equivalent to asking each unit to suboptimize its way of treating customers. And no unit would readily agree to do so unless the new way is clearly better.

Clearly delineate management accountabilities for data.

It may be appropriate to codify these accountabilities into an overall data policy. Since no enterprise can implement all of these prescriptions at once, the policy should evolve as individual prescriptions are implemented.

Assign senior executives to lead the data management program.

The issues we have described are difficult. Bad data can spread like a virus, while good data remain locked in a vault. And the prescriptions require all managers and units to do their parts. Without strong leadership from the top, data management programs risk falling prey to the disparate agendas of functional areas and their leaders.

Each of these prescriptions requires clear management accountability for data. One trap is to assume that if data are in the computer, they must be the CIO’s responsibility. While the CIO may be responsible for the underlying technology, most data are generated and used by operational and decision-making units that are not under his or her control. The CIO is often removed from important data. Accountability for data must lie with those closer to data creation and use. Prescriptions 5 and 8 (the role of technology and the avoidance of political battles) depend on individual initiative. Prescriptions 1 through 3 (quality program, supplier program, and honed data needs) should normally be the responsibility of functional areas, although a centralized function may provide common tools and methods. Prescription 4 (information chains) calls for cross-functional alignment. Prescription 6 (data inventory) is normally a centralized function. Prescription 7 (terms and conditions for data sharing) requires agreement from units in possession of data.

Data are created and used in enormous quantities in activities ranging from the most mundane operation to the most far-reaching strategic decision. They are rarely managed well. To improve this situation, enterprises and managers must understand the properties of data and manage them as resources. As we have seen, data, unlike many conventional resources, are intangible, easy to copy and transport, and renewable. Enterprises must think about and manage data differently. The prescriptions described here can help a company begin to take better advantage of this traditionally underutilized resource.

Topics

References

1. Collections of articles include:

D.A. Klein, The Strategic Management of Intellectual Capital (Boston: Butterworth-Heinemann, 1998);

P.S. Myers, Knowledge Management and Organizational Design (Boston: Butterworth-Heinemann, 1996);

D. Neef, The Knowledge Economy (Boston: Butterworth-Heinemann, 1998);

L. Prusak, Knowledge in Organizations (Boston: Butterworth-Heinemann, 1997); and

R.L. Ruggles III, Knowledge Management Tools (Boston: Butterworth-Heinemann, 1997). Recent books include:

T.H. Davenport with L. Prusak, Information Ecology: Mastering the Information and Knowledge Environment (New York: Oxford University Press, 1997);

T.H. Davenport and L. Prusak, Working Knowledge: How Organizations Manage What They Know (Boston: Harvard Business School Press, 1998);

R.W. Lucky, Silicon Dreams: Information, Man, and Machine (New York: St. Martins Press, 1989);

I. Nonaka and H. Takeuchi, The Knowledge-Creating Company (New York: Oxford University Press, 1995);

A. Penzias, Ideas and Information: Managing in a High-Tech World (New York: W.W. Norton & Company, 1989); and

T. Stewart, Intellectual Capital (New York: Doubleday, 1997).

2. Lucky (1989).

3. Some authors have noted that “data” are the raw material for “information,” which is the raw material for “knowledge.” We think that the reverse direction is even more important. Knowledge created or developed by an individual or a group must eventually become structured data so others can apply that knowledge.

4. L.P. English, “The High Costs of Low Quality Data,” DM Review, volume 8, January 1998, pp. 38, 52, 54; and

T.C. Redman, “The Impact of Poor Data Quality on the Typical Enterprise,” Communications of the ACM, volume 41, number 2, 1998, pp. 79–82.

5. See: Davenport and Prusak (1998); and Stewart (1997).

6. Many authors either consider properties of information rather than data or treat the two concepts as synonyms. It is instructive to compare our summary of data properties in Table 1 with those of information provided, for example, by:

H. Cleveland, “Information as a Resource,” The Futurist, volume 16, December 1982, pp. 34–39; and

C.F. Burk and F.W. Horton, InfoMap: A Complete Guide to Discovering Corporate Information Resources (Englewood Cliffs, New Jersey: Prentice-Hall, 1988).

7. American Heritage Dictionary, second college edition (Boston: Houghton Mifflin, 1985).

8. Some authors use the term “assets” instead of “resources.” But “assets” is a narrower concept. According to the American Heritage Dictionary (1985), an asset is “anything owned that has exchange value” or “an entry on a balance sheet.” While almost all data clearly have value, most data are not for sale, and they do not appear on balance sheets, so they do not qualify as assets.

9. C. Fox, W. Frakes, and P. Gandel, “Foundational Issues in Knowledge-Based Information Systems,”The Canadian Journal for Information Science, volume 13, number 3, 1988, pp. 90–102;

F.N. Teskey, “User Models and World Models For Data, Information, and Knowledge,” Information Processing and Management, volume 25, number 1, 1989, pp. 7–14;

H. Theiss, “On Terminology,” in A. Debons and A.G. Larson, eds., Information Science in Action: System Design, volume 1 (Hague: Martinus Nijhoff Publishers, 1983), pp. 84–94; and

G. Wiederhold, “Knowledge Versus Data,” in M.L. Brodie and J. Mylopoulos, eds., On Knowledge Base Management Systems (New York: Springer-Verlag, 1986), pp. 77–82.

10. Fox, Frakes, and Gandel (1988).

11. American Heritage Dictionary (1985).

12. Ibid.

13. A.S. Loebl, “Accuracy and Relevance and the Quality of Data,” in G.E. Liepins and V.R.R. Uppuluri, eds., Data Quality Control: Theory and Pragmatics (New York: Marcel Dekker, 1990), pp. 105–144.

14. See, for example, H. Brinberg, “Information Economics: Valuing Information,” Information Management Review, volume 4, number 3, 1989, pp. 59–63; and

A. Repo, “Economics of Information,” in M.E.Williams, ed., Annual Review of Information Science and Technology, volume 22 (Amsterdam: Elsevier Science Publishers, 1987), pp. 3–36.

15. Brinberg (1989).

16. J.L. King and K.L. Kraemer, “Information Resource Management: Is It Sensible and Can It Work?,” Information & Management, volume 15, number 1, 1988, pp. 7–14.

17. American Heritage Dictionary (1985).

18. These observations do not contradict the observations regarding the difficulties of valuing data: to ascertain the qualitative fact of diminishing value, one does not have to know how to measure it precisely.

19. Process Quality Management & Improvement Guidelines, Issue 1.1 (AT&T, 1988).

20. It could be argued that the “ability to store data” is a fundamental property of computers, not the other way around. This may be true. But the practical consequences to managers are the same.

21. B. Ronen and I. Spiegler, “Information as Inventory,” Information and Management, volume 21, number 4, 1991, pp. 239–247.

22. Process Quality Management & Improvement Guidelines (1988);

T.C. Redman, Data Quality for the Information Age (Norwood, Massachusettts: Artech House, 1996).

23. W.H. Inmon, C. Imhoff, and R. Sousa, Corporate Information Factory (New York: Wiley, 1998).

24. M.H. Brackett, Data Sharing Using a Common Data Architecture (New York: Wiley, 1994).

25. T.H. Davenport, R.G. Eccles, and L. Prusak, “Information Politics,” Sloan Management Review, volume 34, Fall 1993, pp. 53–65.

26. B. Violino, “Tempting Fate,” InformationWeek, 4 October 1993, pp. 42–52.

27. A. Branscomb, Who Owns Information? From Privacy to Public Access (New York: HarperCollins, 1994); and

H.J. Smith, “Privacy Policies and Practices: Inside the Organizational Maze,” Communications of the ACM, volume 36, December 1993, pp. 105–122.

28. W.M. Bulkeley, “Databases Are Plagued by Reign of Error,” Wall Street Journal, 26 May 1992, p. B6.

29. R. Knight, “The Data Pollution,” Computerworld, 28 September 1992, pp. 81–84; and

L. Wilson, “Devil in Your Data,” InformationWeek, 31 August 1992, pp. 48–54.

30. For an in-depth discussion, see:

Redman (1996); and

R.Y. Wang and D.M. Strong, “Beyond Accuracy: What Data Quality Means to Data Consumers,” Journal of Management Information Systems, volume 14, number 4, pp. 5–34.

31. K. Orr, “Data Quality and Systems Theory,” Communications of the ACM, volume 41, number 2, 1998, pp. 66–71.

32. For methods of solution and case studies, see: L.P. English, “Data Quality: Definition and Principles,” DM Review, volume 6, November 1996, pp. 46–51;

R. Kovac, Y.W. Lee, and L.L. Pipino, “Total Data Quality Management: The Case of IRI,” in D.M. Strong and B.K. Kahn, eds., The 1997 Conference on Information Quality (Cambridge, Massachusetts: MIT, 1997), pp. 63–79;

T.C. Redman, “Improve Data Quality for Competitive Advantage,” Sloan Management Review, volume 36, Winter 1995, pp. 99–107;

Redman (1996);

G.K. Tayi and D.P. Ballou, “Introduction” (Special Section: “Examining Data Quality”), Communications of the ACM, volume 41, number 2, 1998, pp. 54–57; and

R.Y. Wang, “A Product Perspective on Total Data Quality Management,” Communications of the ACM, volume 41, number 2, 1998, pp. 58–65.

33. Redman (1996).

34. “Business Is Turning Data into a Potent Strategic Weapon,” Business Week, 22 August 1983, p. 92.

35. D.L. Goodhue, J.A. Quillard, and J.F. Rockart, “Managing the Data Resource: A Contingency Perspective,” MIS Quarterly, volume 12, June 1988, pp. 373–392;

R. Sabherwal and W.R. King, “Toward a Theory of Strategic Use of Information Resources,” Information & Management, volume 20, number 3, 1991, pp. 191–212; and

E.G. Vesely, Strategic Data Management: The Key to Corporate Competitiveness (Englewood Cliffs, New Jersey: Yourdon Press, 1990).

36. See: Davenport, Eccles, and Prusak (1993); and

J.L. Weldon, “Who Owns Data?” Journal of Information Systems Management, volume 3, Winter 1986, pp. 54–57.

37. See, also: P. Strassman, The Politics of Information Management (New Canaan, Connecticut: Information Economics Press, 1994).

38. See, also: B.K. Kahn, “Some Realities of Data Administration,” Communications of the ACM, volume 26, October 1983, pp. 794–799.

39. See: T.K. Landauer, The Trouble with Computers: Usefulness, Usability, and Productivity (Cambridge, Massachusetts: MIT Press, 1995); and

P. Strassman, The Squandered Computer: Evaluating the Business Alignment of Information Technologies (New Canaan, Connecticut: Information Economics Press, 1997).

Reprint #:

4017

More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.