Two legal experts call out the dangers of sophisticated data analytics, left unchecked.

The 1997 sci-fi film Gattaca recently came back into headlines when a Wikipedia summary of the movie’s plot was apparently used as the basis of a speech by Kentucky Senator Rand Paul. The movie presents a society where DNA determines social class. A database identifies genetically superior individuals — termed “valids” — while winnowing out their naturally conceived “in-valid” counterparts.

The valids have predetermined life — and career — paths, unalterable by desire, capability, circumstance or happenstance. The in-valids have an equally predetermined destiny, one that leaves them on a much lower social scale.

Senator Paul’s agenda notwithstanding, the question is worth asking: Could our society ever use genetics, biometrics and, essentially, predictive analytics to determine an individual’s path, as in Gattaca?

That risk may become real, according to a recent Stanford Law Review paper, Three Paradoxes of Big Data, if big data continues on its current course. “We want to suggest that the utopian rhetoric of big data is frequently overblown, and that a less wild-eyed and more pragmatic discussion of big data would be more helpful,” write Neil Richards, professor of law at Washington University, and Jonathan King, who is pursuing an advanced legal degree at Washington University School of Law and serves as the vice president of cloud strategy and business development at Savvis.

The two suggest that while there are clearly benefits to be derived from mining large data sets using sophisticated analytics — from the potential to conserve precious resources to tracking and curing lethal diseases — there are also implications (inherent dangers, really) of which the public needs to be aware.

The authors frame three paradoxes around transparency, identity and power that are the result of the big data movement — and suggest ways to move forward.

The Transparency Paradox:

Big data is really the amalgamation of little data inputs — information about people, places and things collected by sensors, cell phones, click patterns and other data-generating mechanisms. These data inputs are collected by commercial and government systems, for big data purposes.

While big data promises to use this information to make the world more transparent, the actual collection of data happens invisibly — “its tools and techniques shrouded by layers of physical, legal and technical privacy by design,” write the authors. The paradox? “If big data spells the end of privacy, then why is the big data revolution occurring mostly in secret?”

We are not proposing that systems be stored insecurely or opened to the public en masse. But we must … bring legal, technical, business, government, and political leaders together to develop the right technical, commercial, ethical, and legal safeguards for big data and for individuals. We cannot have a system, or even the appearance of a system, where surveillance is secret, or where decisions are made about individuals by a Kafkaesque system of opaque and unreviewable decisionmakers.

The Identity Paradox:

The authors point out that everyone has the right to define their active identity — their “I am” statement — in its many incarnations, at any given moment in time (“I am me; I am anonymous. I am watching, I am buying,” and so on). But with even the most basic access to a combination of big data pools — phone records, buying history, surfing history, social networking posts — those identities can be shaped and influenced by other entities.

By applying advances in personal genomics to academic and career screening, the dystopian future portrayed in the movie Gattaca might not be that outlandish. In Gattaca, an aspiring starship pilot is forced to assume the identity of another because a test determines him to be genetically inferior. Without developing big data identity protections now, “you are” and “you will like” risk becoming “you cannot” and “you will not”.

The Power Paradox:

Big data enables a sharper, clearer picture of the world, say the authors. (The Arab Spring is the example they cite). However, the big data sensors and data pools that create these understandings of the world are in the hands of “powerful intermediary institutions, and not ordinary people.”

Big data will create winners and losers, and it is likely to benefit the institutions who wield its tools over the individuals being mined, analyzed, and sorted. Not knowing the appropriate legal or technical boundaries, each side is left guessing. Individuals succumb to denial while governments and corporations get away with what they can by default, until they are left reeling from scandal after shock of disclosure. The result is an uneasy, uncertain state of affairs that is not healthy for anyone and leaves individual rights eroded and our democracy diminished.

While the authors are not offering a big-picture solution, they do have suggestions: big data regulation, to some extent, and the development of a concept of big data ethics — a “social understanding of the times and contexts when big data analytics are appropriate, and the times and contexts when they are not.”