Many companies are turning to machine learning to review vast amounts of data, from evaluating credit for loan applications, to scanning legal contracts for errors, to looking through employee communications with customers to identify bad conduct. New tools allow developers to build and deploy machine-learning engines more easily than ever: Amazon Web Services Inc. recently launched a “machine learning in a box” offering called SageMaker, which non-engineers can leverage to build sophisticated machine-learning models, and Microsoft Azure’s machine-learning platform, Machine Learning Studio, doesn’t require coding.
But while machine-learning algorithms enable companies to realize new efficiencies, they are as susceptible as any system to the “garbage in, garbage out” syndrome. In the case of self-learning systems, the type of “garbage” is biased data. Left unchecked, feeding biased data to self-learning systems can lead to unintended and sometimes dangerous outcomes.
In 2016, for example, an attempt by Microsoft to converse with millennials using a chat bot plugged into Twitter famously created a racist machine that switched from tweeting that “humans are super cool” to praising Hitler and spewing out misogynistic remarks. This scary conclusion to a one-day experiment resulted from a very straightforward rule about machine learning — the models learn exactly what they are taught. Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), a machine-learning system that makes recommendations for criminal sentencing, is also proving imperfect at predicting which people are likely to reoffend because it was trained on incomplete data. Its training model includes race as an input parameter, but not more extensive data points like past arrests. As a result, it has an inherent racial bias that is difficult to accept as either valid or just.
These are just two of many cases of machine-learning bias. Yet there are many more potential ways in which machines can be taught to do something immoral, unethical, or just plain wrong.
Best Practices Can Help Prevent Machine-Learning Bias
These examples serve to underscore why it is so important for managers to guard against the potential reputational and regulatory risks that can result from biased data, in addition to figuring out how and where machine-learning models should be deployed to begin with. Best practices are emerging that can help to prevent machine-learning bias. Below, we examine a few.
Consider bias when selecting training data. Machine-learning models are, at their core, predictive engines.