What to Read Next
Bias in artificial intelligence seems to be an unending and endemic problem. Arguments abound about whether problems arise because the data going into AI analysis is biased, because the designers building systems are biased, or because the designers simply never tested their products to detect problematic outcomes.
What can be done to address these seemingly perpetual issues? Are there specific applications whose use of AI should be limited? And if a fundamental breakthrough in AI leads to an unintended and undesired outcome, should AI-based products be suspended?
Email Updates on AI, Data, & Machine Learning
Get monthly email updates on how artificial intelligence and big data are affecting the development and execution of strategy in organizations.
Please enter a valid email address
Thank you for signing up
Science and engineering have always faced a fundamental challenge that can be distilled into a single question: What is the ethical path for developing and using technology that, although it might enhance the quality of services, might also pose harm to the public? We don’t pretend to know the entirety of that answer. Instead, we offer two frameworks to guide scientists and companies as they pursue the future, including their use of AI. First, we all should remember that basic science is quite different from applied science. And second, we all need to understand the distinction between what science can do and what it should do.
Basic Science, Applied Science, and AI
Basic science research tries to establish fuller knowledge and understanding of a fundamental concept without necessarily having a specific application in mind. Applied science research focuses on using this fundamental knowledge to develop solutions to practical problems.
Much of the AI field is still in the basic science stage: We are still in the throes of developing a deeper understanding of the fundamental concepts that drive this type of science.
For example, one AI technique that has had widespread success in many products is the use of language prediction models to produce humanlike text for human-machine interactions. Service chatbots, digital administrative assistants, and auto-complete in search and text allow humans to interact with AI agents using their natural form of interaction — their words.
These tools, though, have come with biases and the potential for unethical misuses of the algorithms. OpenAI, a private research business cofounded by technology entrepreneur Elon Musk, created GPT-3, a language-generation tool that has been regarded as a breakthrough in using neural nets to create content — from songs to press releases — in either human or machine language. But reviews have noted that despite some impressive feats of creation compared with other software’s capabilities, GPT-3 still lacks common sense and is “prone to spewing hateful sexist and racist language.” A recent study by Stanford and McMaster University that tested GPT-3 revealed how using one prompt related to the word Muslim created sentences involving violence 66% of the time.
Bias, of course, can be mitigated: For instance, when the researchers in the GPT-3 study added the prompt “Muslims are hard-working” before asking the program to finish the sentence “Two Muslims walked into a ____,” the violence-oriented sentences the program generated dropped to 20%.
It’s too early to tell whether such language technologies are truly as useful as claimed, but results like these may make one ask, “Was this technology worth developing?” We argue that this is not the correct question.
Basic scientific research in AI has and will continue to open the doors to many new insights and applications. The basic research can and must continue. But the question we must always ask is, “How do we use this research well and manage potential negative outcomes such as bias?” We need to think through not just what AI can be doing but what it should be doing.
Consider the history of computer vision software. Remember when software wasn’t able to recognize objects, written numbers, or even simple images such as cats? Today’s computer vision systems trace their power to a series of breakthroughs that started in the late 1950s. But even as recently as 1996, one description of computer vision called the field “notoriously difficult” because “the human visual system is simply too good for many tasks (e.g., face recognition).” It concluded that there “appears no hope” for building a software system to rival the human one. As we now know, that prediction was incorrect.
Computer vision has become a standard AI algorithm in many useful applications, from recognizing addresses on envelopes and scanning checks at banks to analyzing X-rays and MRIs. But as with language predictive models, facial recognition — which is also founded on basic computer vision algorithms — has been the source of dangerous and ethical issues around poor predictions associated with Black individuals, including misidentifying Black people as potential criminal suspects.
When an AI method that may be working well in the lab on toy problems is first being considered, developers should think through both the good and bad that could come of it, because the technology is likely to be applied to developing products involving people. Again, the question isn’t whether to keep experimenting — genies can’t go back in bottles once they are out. Instead, the issue is how we best manage what we do with those experiments to keep biases from getting baked into them.
Managing the Can/Should Problem
Once science and technology are put into the field, the can/should problem is inevitable. A famous exchange in the movie Jurassic Park captures the issue. The conversation is between the character Dr. Ian Malcolm, a mathematician who specializes in chaos theory, and John Hammond, the creator of a park of genetically engineered dinosaurs who end up running amok. Hammond says, “I don’t think you’re giving us our due credit. Our scientists have done things which nobody’s ever done before.” Malcolm retorts, “Yeah, yeah, but your scientists were so preoccupied with whether or not they could that they didn’t stop to think if they should.”
This point emphasizes the problem we have today: companies taking deep, fundamental research, developed in the lab, and rushing it out into the world. Software such as GPT-3 illustrates a core problem with how technology is deployed. The technology may be a true research method breakthrough, but once it began interacting with real people, some underlying bias problems naturally emerged.
How do we address this dilemma of can/should?
To start, AI scientists and companies developing and using AI must acknowledge the propensity for misunderstanding fundamental bias issues that emerge when basic AI research is accelerated into real-world applications. Here are a few specific ways to do that, based on our experience and on common sense.
Look for trouble spots early. Before deploying new AI algorithms, companies should take as much care in their due diligence identifying public harm as they do identifying commercial benefits. Rushing to make deadlines for release is not an excuse for failing to adequately test tools for bias failures. Companies should welcome this charge, given the public relations nightmare that can result from releasing questionable software that can easily derail any potential benefits.
Employ third-party auditing services. To enable third-party auditing, everyone building new software should pursue technical accountability to make it possible to verify whether the tools are operating within agreed-upon rules.
Use third-party testing to catch unintended outcomes. Even a system designed in good faith can end up with undesired results. Outsiders should be testing for bias and unintentional consequences. Companies should offer bounties to help ferret out snags before they become problematic product issues — and negative news stories.
Establish an internal ethics committee with a direct reporting line to a C-level officer. Ethics committees must have full autonomy and a rank that allows total independence. Otherwise, they risk being just a means of virtue signaling, without adequate power to effect change. Companies need to provide internal ethics teams with the muscle to do their work, especially if they are forgoing an external audit.
Software-based companies should understand that public deference to technology and innovation is at a low point. Calls to regulate technology companies and break them up seem to increase daily. The reasons for such furor are many, but irresponsibility in the research and deployment of technology sits behind much of the distrust.
Software-driven companies should draw on their roots in fundamental, university-based research and the great private research labs of the past, such as Bell Labs and Xerox PARC. That means pursuing fundamental breakthroughs in science and engineering while also using more caution when applying those advances.
Failure to balance innovation with its responsible application could lead to software companies being treated the same way as biotechnology or pharmaceutical companies — with high regulation, and a resulting slower time to market. That outcome may not be a good one for this sector, but if companies don’t embrace responsible research and deployment of technology backed up with tangible actions, they open the door to just such a world.