Balance Efficiency With Transparency in Analytics-Driven Business

The ubiquity of algorithms in daily life raises questions about ethics, transparency, and who’s keeping tabs on how those algorithms work.

We have disturbingly little idea how many of the algorithms that affect our lives actually work. We consume their output, knowing little about the ingredients and recipe. And as analytics affects more and more of our lives and organizations, we need more transparency. But this transparency may be a bitter pill for businesses to swallow.

In 1906, Upton Sinclair’s The Jungle described the oppressed life of immigrant workers, specifically those in the meatpacking industry in Chicago. Sinclair’s intent in portraying the working conditions of a powerless class may have been to inspire political change. However, the graphic depictions of unsanitary food preparation helped bring transparency to manufacturing processes through the story’s nauseating clarity. The book heavily influenced the creation of regulatory oversight through organizations that eventually became the U.S. Food and Drug Administration.

We might be similarly horrified if we knew what evils lurked in the hearts of business algorithms in use today.

Some examples are lesser evils. Google search is widely used, but details about the order (and inclusion) of pages in its results aren’t public. Credit scores directly affect our finances, but the specific algorithms used to calculate them are secret. And the use of analytics to create algorithms is spreading rapidly to judicial processes, advertising, hiring, and many other daily decisions.

But these are the oxymoronic obvious unknowns. There may be greater evils lurking beneath the surface. The internal operations of businesses have always been a bit murky to consumers. There are algorithms in use within organizations that we as consumers don’t know that we don’t know about — preferential treatments, pricing differences, service prioritization, routing sequences, internal ratings, and so on. There is little opportunity even to know these algorithms exist, much less the analytical results on which they are based.

It actually makes sense that we lack good ways to see how analytical results are produced. Companies want to protect their intellectual property — this is their secret sauce. Whatever advantage companies get from data does not come without effort. Given the considerable investment underlying that effort, companies would certainly be reluctant to give away their hard-earned insights embedded in algorithms. Why would they even consider it?

The difficulty, as in The Jungle, is that others consume what is produced.

With food, we likely would be quite reluctant to want to go back a century to the time of The Jungle — the era before food labeling and inspection processes were required. The bane of every school cafeteria is the dreaded “mystery meat.” We don’t like mystery in what we consume.

With analytics, we are in a Jungle scenario. Businesses create analytical results that affect our lives, but we don’t know much about the ingredients or recipe. What data is used? From where? How are the models created? What affects the resulting decision?

Some aspects of the mystery of what we are consuming falls squarely on our own plate; a lack of knowledge can stem from a lack of effort to understand analytics. While democratization of data is appealing, the new data republic is a meritocracy. Other aspects of the mystery of what we are consuming, however, are currently unknowable. Businesses have little incentive or motivation to share information about the algorithms they use. And, as a result, they will not provide details without a change in incentives or awareness.

I’m certainly not advocating for new agencies to regulate and inspect all algorithms. Institutionalization brings with it outside influence, power struggles, and lobbying. There are risks in both over- and under-regulation. The point is that we’ve been in similar situations before and can learn from them. When it came to food and medicine, significant regulation — and the FDA — came about as a result of the lack of self-regulation by the companies producing either product. For algorithms, better self-regulation and transparency may preempt the same sort of government regulation that evolved in the food industry. At a minimum, from a perspective of self-interest, the long memory associated with cheap data storage indicates that business secrets won’t last forever anyway.

This is especially true with the rise of deep learning and artificial intelligence techniques that can be opaque even to their developers. A system that passes a Turing test, by definition, hides the details of how it works from those it interacts with. The lack of information about the analytical results is getting worse, not better.

Considerable effort goes into improving data quality. “Garbage in; garbage out” is frequently repeated. But while data may be dirty, algorithms are dirtier. With more transparency into the algorithms in use, we can have informed discussions about what may or may not be fair. We can collectively improve them. We can avoid the ones we are allergic to and patronize the businesses that are transparent about what their algorithms do and how they work.

A side effect of the insight into food processes was the collapse of the market for lemons; consumers wouldn’t purchase suspect ingredients or elixirs with dubious claims. Similarly, we’ll likely find that some businesses are covered-wagon sideshows selling snake oil, and we can knowledgeably avoid the results of their unfair or sneaky algorithms.

1 Comment On: Balance Efficiency With Transparency in Analytics-Driven Business

  • Jaap Vink | August 21, 2017

    I agree with the general tendency of this post but I miss the other side of the coin, the positives:
    1. the use of algorithms makes it possible to have this deeper discussion on ethics in decisioning that wasn’t possible before (and that we definitely need to have) because all the algorithms were hidden in the brains of humans. An algorithm doesn’t have to be automated, decision rules have been in place for years. Psychology has shown (and the former practice of knowledge elicitation in expert systems) these algorithms are very difficult if not impossible to make explicit. One of the issues currentlky about algorithms is the bias that comes from data. A lot of these biases (discriminatory effects f.e.) are the result of of these human algorithms from the past and the machine driven algorithms are ‘rediscovering’ what was actually happening. So Yes, there’a a danger for negative side effects but because of algorithms we can now see what negative side effects our historic decision making processes had and we can have a discussion about them and how to prevent that in the future.
    2. The use of automated algorithms makes it possible to be more consistent in the decision making processes. In the past decision based on human judgment could turn out differently for cases that were exactly the same. Human decision making can be influenced by so many factors (the weather, mood, the way you look, …) that there was no consistency in judging all cases against the same criteria. The use of automated algorithms can remove that inequality.
    3. The pressure to use higher quality data brings an improvement in the way we make decisions. We tend to forget that in the past decisions were made on even worse data.
    4. Most automated algorithms are developed with a clear process that is built around scientific data analysis approaches. This leads to algorithms that are based on a process of critical thinking, testing & validation where in the past the decision rules were often based on assumptions, prejudices, biases (in decision making & and in the limited analyses that were done at best)

    There’s also a set of negatives that I’m missing:
    1. We tend to discuss this as if each algorithm is acting independently. They never are. Usually on the point of decision several algorithms come together and are molded into a (let’s call it) a mega-algorithm. For example: when a Telco tries to prevent people canceling a subscription (churn prevention) they not only look at the propensity for someone to cancel. They often also look at the propensity for someone to accept the retention offer, the expected customer value, the credit risk, and more. They then try to optimize this.
    2. Our actions influence the outcome of the model and therefor we need to be very much aware of the limited time a model maybe used and we may need to take our action into account in our final algorithm. To take the same example of above: when our churn prevention algorithm predicts that a customer is going to churn and we take an action to prevent that from happening we change the reality the model is built on. Therefor we will need to re-visit our model after each cycle. Another example is credit scores: when a business is predicted to have a high risk of failure or a high risk of paying late and a decision is being made to not extend credit we might accelerate the prediction made by the model while if the decision had been made to extend the credit the prediction in the next cycle could have changed.
    3. Probably the most important warning in all of this is that we will never have all the data on everything available despite all the marketing promises of big data vendors and others. And all the data we use is already biased: decisions & choices have been made on what to measure, how to measure it, how to store it and how to make it available.

    There’s much more but these are the points I wanted to make today.

Add a comment