What to Read Next
An increasing number of Internet applications take advantage of the large amount of data accessible via the Web. These applications — now often called “mashups” — make relevant information from multiple Web sites easily accessible at a single Web site. For example, back in the late 1990s, a company called Bidder’s Edge Inc. allowed users to search and compare auction data for more than 5 million items from more than 100 auction sites, such as eBay Inc. and many others, as easily as the user could search one auction site. Currently, Kayak.com lets a user compare airfares by searching numerous travel sites to find the best fares available. With such applications, a user no longer needs to visit multiple sites and manually compare data; the applications do that automatically. They extract and reuse relevant Web data, often in very innovative ways, to make the information more valuable to the user.
As Tim Berners-Lee, inventor of the Web, said in an interview published in Technology Review in 2004, “the exciting thing is serendipitous reuse of data: one person puts data up there for one thing, and another person uses it another way.” It is with that view that technologists have been developing various ways to enable easy data reuse on the Web. Despite the enthusiasm about “serendipitous reuse of data” on the Web among developers and users of such applications, the companies whose data has been reused often have tried hard to control who can use “their data” and how it is reused. eBay sued Bidder’s Edge. Online travel company Expedia Inc. sent a “cease and desist” letter to Kayak. Bidder’s Edge stopped searching eBay per a preliminary injunction court order and later ceased operation altogether. Kayak now does not incorporate Expedia and four other sites’ data in its search results; searches to those sites are performed in separate pop-up windows. Thus, it now takes a more cumbersome manual process to compare those companies’ airfares with those that Kayak automatically extracts and organizes.
Do companies like eBay and Expedia actually own the data on their Web sites so that they can control who can reuse it and how it is reused? From the two examples above, it may seem so. But the reality is less clear and more complex. How can someone own data and control its use if the data is openly accessible via the Web? What is the best strategy for those who think they own the data? And what is the best strategy for those who want to reuse data that is available via the Web? We will address these important questions in this article.
Intellectual Property Rights and Data
Without any doubt, data can be an important asset of a business. The business “owns” the data when it can fully control who can access the information and how it should be used. But when a company makes data accessible to the public on the Web, its “ownership” to that data will be determined by intellectual property law. Most jurisdictions recognize four forms of intellectual property rights: trade secret, trademark, patent and copyright. In this article, we consider databases that contain mundane facts, such as airfares, and have been made publicly accessible; therefore, the database creators cannot claim trade secret, trademark or patent protection.
What about copyright? In its lawsuit, eBay alleged that its copyright was infringed by Bidder’s Edge. The court rejected that allegation on the principle that copyright only protects the original selection and arrangement of factual data, but not the data itself or the effort involved in compiling the facts. That principle was established in 1991 by the U.S. Supreme Court in a classic case involving Feist Publications Inc. and Rural Telephone Service Co. Feist reused 1,309 of approximately 7,700 of Rural’s White Pages listings in creating a phone book covering a large area that included Rural’s service area. The Supreme Court decided that copyright should be used to reward originality, and originality requires “some minimal degree of creativity.” The decision explicitly rejected the so-called sweat of the brow doctrine that attempts to use copyright to reward “the hard work that went into compiling facts.”
Although there are differences in the originality requirement of copyright law internationally, it is quite uniform that one cannot claim copyright protection for individual entries of facts stored in a database. Thus, copyright cannot be used to prevent others from reusing individual facts from a database when that database is openly accessible via the Web. What’s more, copyright has been evolving. Well-known legal scholar Lawrence Lessig, founder of the Creative Commons initiative, advocates a form of copyright that is less restrictive and more conducive to creative reuses — which Lessig calls “remixes” — of digital content.
Enter the new era of database protection. The European Union introduced the Database Directive in 1996, requiring member states to implement laws to grant database creators a sui generis right, which is called the database right in the United Kingdom. This sui generis right lets the database creator prevent unauthorized extraction of all or a substantial part of the database. In fact, a database creator can, in some cases, prevent systematic extraction of even an insubstantial part of the database contents — if the extraction would cause serious damage to the database creator’s investment.
Under this new law, the British Horseracing Board Ltd. (BHB) sued William Hill Organization Ltd., an online betting company that, on its Web site, reused the lists of upcoming horse races created by BHB. In 2002, William Hill was initially found to have violated BHB’s database right. But the decision was reversed in 2005 after the European Court of Justice issued its opinion. The protection afforded by the sui generis right, in reality, is much narrower than had been expected earlier; in practice, the right only prevents data reuse that would substantially harm the database creator’s investment.
Following the EU’s introduction of the Database Directive in 1996, the U.S. Congress between 1996 and 2004 considered six bills that proposed varying degrees of protection for database creators. However, they all failed to pass into law. As a result, there remain considerable legal uncertainties in data reuse and database protection.
Strategy For Database Creators
In the face of such uncertainty, what should a company do if it is a database creator? It is always good to be vigilant about reusers who extract your data to re-create an identical copy of your database. But most reusers are unlikely to do this; verbatim data-copying activities bear abundant legal risks and little economic benefits. However, if the reuser uses technology to create a significantly different database that has much richer data and functionality, you might not have an effective legal means to protect your database against it. What else can you do? Think data repurposing: That is, you should find innovative ways to make your data valuable to broader market segments.
There are at least two possible innovative ways of data repurposing to enhance its value: (1) sell the “private” data that is related to the openly accessible data and (2) be a reuser yourself, but offer better and more creative service by leveraging your unique capabilities.
1. Sell “private” data. A company that offers open access to certain data also probably has certain other “private” data that make the combined data set unique. Like other types of information goods, a unique database has much less competition and can often be sold with a good profit margin. For example, from eBay’s Web site, one can obtain only the current bidding prices for various items. The actual transaction prices (and historical prices for similar items) are not openly available from eBay’s site. Neither are the quantities sold at different prices. However, such information is vital to people who want to analyze online auction markets for purposes such as auction mechanism design and auction trend analysis. Realizing that the data combined with auction prices can serve other purposes, such as market research, eBay is now selling such data directly via licenses, as well as through resellers.
Related Research and Articles
- H. Zhu and S. Madnick, “One Size Does Not Fit All: Legal Protection for
Non-Copyrightable Data,” Communications of the ACM, in press.
- S. Madnick and M. Siegel, “Seizing the Opportunity: Exploiting Web
Aggregation,” MIS Quarterly Executive 1, no. 1 (March 2002): 35-46.
- H. Zhu, M.D. Siegel and S.E. Madnick, “Information Aggregation — A Value-Added E-Service,” 5th International Conference on Technology, Policy and Innovation, The Hague, The Netherlands (June 26-29, 2001).
- J. Guterman, “Does Current Copyright Law Hinder Innovation?” MIT Sloan Management Review 50, no. 2 (Winter 2009): 14-15.
Similarly, at the time of its lawsuit against William Hill, the BHB was responsible for creating lists of upcoming U.K. horse races and ensuring the eligibility of the participants. To do that, the BHB had to collect and verify information about the identity of the person entering the horse, the characteristics of the horse and the identity of the owner and the jockey. The BHB did not give out this information for free; instead, it sold it via licenses.
As a database creator, you should strategically decide what part of the database should be free to the public, and what can and should be kept private so that you can sell it. The combined data (free plus private data) is more useful because it represents a complete data set. Furthermore, the free database can often help increase demand for the private database. For example, when people look up upcoming horse races, they are more likely to be interested in knowing about the horses and the jockeys entering the races. Since this information can help people predict the winners, they will value it highly, and you can charge a high price for it. For the BHB, the information about the horses, their owners and the jockeys was a good complement to the upcoming racing lists. In fact, the BHB invested approximately £4 million a year in maintaining its database.
2. Become a reuser. In addition to creating data, you can also become a data reuser. In doing so, you can exploit the new information derived from integrated data and may find opportunities to expand your business scope. For example, Atlanta-based shipping carrier United Parcel Service Inc. owns iShip Inc., a data reuser based in Bellevue, Washington, that compares shipping charges for multiple carriers.
Strategy For Data Reusers
What should a business do if it wants to reuse someone else’s data? By extracting and integrating data from multiple databases, data reusers can use technology to add value to existing data. Their data reuse not only provides the convenience of easy access to otherwise disparate information, but the integrated data can also provide insights that are otherwise invisible.
Although with technology the reuser probably can reconstitute a creator’s entire database, this should not be done, for reasons discussed earlier. Instead, focus on using the technology to construct a unique database that will add value to existing data, whether through differentiation or data analysis. In practice, it is always good to consult your legal counsel about the possible need for obtaining licenses with reasonable terms from database creators.
1. Differentiate. Differentiation can be in various dimensions. Most price comparison sites reuse data from a wide range of vendors. The immediate value they add is improved efficiency in searching for the best deals. They can also further differentiate their databases by improving the quality of the data and adding functionality to the database. For example, AddAll.com reuses price data internationally. It presents data in a uniform currency chosen by the user, even though the original data may be in various other currencies. Therefore, its data has higher value to the user because it is easily interpretable and more usable than the data in original sources.
2. Analyze the data. In addition to leveraging technology to develop differentiated databases, you should also look into ways of leveraging another unique asset that you possess: the integrated data itself. You can use the data for making better decisions or offer services to help others to make better decisions. For example, BizRate is a price comparison site. It also analyzes the searches and click streams to its price comparison database to produce market analysis that can be sold to market research firms and retailers.
If your company has put up an extensive Web site, chances are fairly good that, whether you realize it or not, you have made a database available for others to reuse. You may own the database, but it is difficult for you to own its contents completely after you have made them publicly accessible, unless you find innovative ways of using those contents. So when you have data, use it innovatively; if you do not, someone else most likely will. Conversely, when you reuse data obtained from other companies, add value. As Berners-Lee pointed out, opportunities for data reuse have made the Web an exciting place for innovation. And, going forward, the Web will continue to be an important platform for delivering the value created by innovative data reuse.