Researchers suggest a new framework to regulate the “fairness” of analytics processes. What could this mean for your organization?
There are certain insights online retailers are able to derive about their customers using data and analytics. Now, brick-and-mortar stores are able to get similar insights — whether someone who walked by their store came in, how long they stayed, where they went inside the store — even by what path. Predictive analytics can be applied to determine what resources a store might need, optimal layouts, or what shoppers might be likely to purchase.
One such analytics provider, Euclid, has about 100 customers, including Nordstrom and Home Depot, and has already tracked about 50 million devices (your smartphone and mine) in 4,000 locations, according to The New York Times.
Euclid, which has a prominent “Privacy” tab on its homepage, supplies retailers with reports culled from aggregated, anonymized data. While it does abstract data from cell phones that could identify an individual, it doesn’t actually use this data to pinpoint individuals (yet). To drive this point home — and to reassure, one would assume, jittery retailers worried about creeping out customers with Minority Report-like predictive technology — Euclid has worked with the Future of Privacy Forum to develop the Mobile Location Analytics Code of Conduct, a self-regulatory framework for consumer notification.
But, say researchers, these types of frameworks — not unlike like the White House’s Consumer Privacy Bill of Rights — do not go far enough to protect individuals from the potential harm of predictive algorithms in an era of big data, particularly as their use expands beyond retail — law enforcement, health care, insurance, finance, human resources.
In a recent research paper, authors Kate Crawford, principal researcher at Microsoft Research, a visiting professor at the MIT Center for Civic Media and senior fellow at NYU Information Law Institute, along with Jason Schultz, associate professor of clinical law at NYU School of Law, are proposing a new method to harness predictive privacy harm: Procedural data due process that determines, legally, the fairness of an algorithm.
Procedural data due process would, rather than attempt regulation of personal data collection, use, or disclosure ex ante, regulate the “fairness” of the analytical processes of big data with regard to how they use personal data (or metadata derived from or associated with personal data) in any “adjudicative” process — a process whereby big data is being used to determine attributes or categories for an individual.
The authors use the example of a health insurance provider that uses big data to determine the likelihood that a customer has a certain disease and denies coverage on that basis. The customer would have a data due process right with regard to that determination.
All well and fine, but what does this mean for organizations that predictive analytics with big data? A couple of things — should data due process pass policy and legislative muster. The authors outline three principles for implementation:
Notice: This principle requires that those who use big data to make categorical or attributive determinations about others must post some form of notice disclosing not only the type of predictions they are attempting, but also the general sources of data that they draw upon, including a mechanism that allows those whose personal data is included to learn of that fact.
Also, when a set of predictions has been queried, notice would be sent out to inform those affected by the issues predicted. For example, if a company were to license search query data from Google and Bing to predict which job applicants would be best suited for a particular position, it would have to disclose that to all applicants who apply.
Opportunity for a Hearing: Once notice is available, the question then becomes how one might challenge the fairness of the predictive process employed. The answer: a hearing and an ability to correct the record. This would include examining both the data input and the algorithmic logic applied. In contexts where security and proprietary concerns arise, the role could be given to a neutral, trusted data arbiter.
Impartial Adjudicator and Judicial Review (or separation of church and state): A neutral data arbiter could field complaints and then investigate sufficient allegations of bias or financial interest that might render a predictive outcome unfair. The arbiter could examine the relationship between those who designed the analytics and those who run the individual processes to make sure that their roles are appropriate and distinct.
Importantly for businesses, this would require some form of audit trail that recorded the basis of predictive decisions, both in terms of data used and the algorithm employed.
The bottom line, according to the authors, is this: “Unless one decides that privacy regulations must govern all data ever collected, processed, or disclosed, deciding where and when to draw the lines around these activities becomes extremely difficult with respect to big data information practices.” Due process strikes the best regulatory balance between privacy and analytics.