Does Data Have a Shelf Life?

Organizations are collecting and utilizing a massive amount of data. New research asks the question, “When is the right time to refresh that data?”

Reading Time: 3 min 

Topics

Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.
More in this series

Creating insights from data is an important, and costly, issue for many companies. They spend time and effort collecting data, cleaning it, and using resources to find something meaningful from it. But what about after the insights have been generated? Do insights have a shelf life? If so, when should knowledge gleaned from old data be refreshed with new data?

In their new research paper, When is the Right Time to Refresh Knowledge Discovered from Data? Xiao Fang and Olivia R. Liu Sheng of the University of Utah, in collaboration with Paulo Goes of the University of Arizona, suggest that for real-world Knowledge Discovery in Databases (KDD) — applications like customer purchase patterns or public health surveillance — new data is imperative:

It could bring in new knowledge and invalidate part or even all of earlier discovered knowledge. As a result, knowledge discovered using KDD becomes obsolete over time. To support effective decision making, knowledge discovered using KDD needs to be kept current with its dynamic data source.

However, staying current with data sources is a fundamental challenge, rated as one of the top three management issues cited by knowledge management practitioners, according to the researchers:

A trivial solution is to run KDD whenever there is a change in data. However, such solution is neither practical, due to the high cost of running KDD, nor necessary, since it often results in no new knowledge discovered. On the other hand, running KDD too seldom could lead to significant obsoleteness for the knowledge in hand. Therefore, it is critical to determine when to run KDD so as to optimize the trade-off between the obsoleteness of knowledge and the cost of running KDD.

Their solution? Model an optimal knowledge refresh policy.

With research areas spanning machine learning and data mining (Fang), data mining and optimization techniques (Sheng), and modeling of complex production and information systems (Goes), the model developed by the researchers is — for those in management who don’t have a PhD in information systems — hopelessly bound in mathematical equations.

So I asked the authors to explain in lay terms how their research — and model — can help managers decide when the right time is to refresh knowledge discovered from data.

Topics

Competing With Data & Analytics

How does data inform business processes, offerings, and engagement with customers? This research looks at trends in the use of analytics, the evolution of analytics strategy, optimal team composition, and new opportunities for data-driven innovation.
More in this series

More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.

Comments (2)
Bryan Kolterman
Hi Andy, 

I don't think old data is literally removed rather than refreshed as new data becomes available. With the low cost of data storage, companies can archive a majority of old data so it is never truly lost. I completely agree with you that all data, old and new, is helpful for predictive analysis. Their examples of using KDD for customer purchase patterns and public health surveillance may add some clarity to your question. These are probably two applications where old and new data would predict vastly different trends, so the constant refresh is necessary to predict the next best move for the company. This is very 'time dependent' data, especially customer purchase patterns- consumer behavior is constantly changing. With applications where  trend analysis shows relative stability over time, old data and new data might be equally effective at what you are trying to predict. In this situation, it could be beneficial for the company to forgo the cost of running KDD and losing the knowledge.
Andy Capaloff
Is the old Data literally removed?  It seems that trend analysis is also very important, so Data should somehow be designated as 'not current', as opposed to truly removed.  Of course that makes Big Data bigger, but it also aids in predictive analysis.  Perhaps I missed a point somewhere here.