University of Chicago professor Berkeley Dietvorst explains why we can’t let go of human judgment — to our own detriment.
Even when faced with evidence that an algorithm will deliver better results than human judgment, we consistently choose to follow our own minds.
MIT Sloan Management Review editor in chief Paul Michelman sat down with Berkeley Dietvorst, assistant professor of marketing at the University of Chicago Booth School of Business, to discuss a phenomenon Dietvorst has studied in great detail. (See “Related Research.”) What follows is an edited and condensed version of their conversation.
MIT Sloan Management Review: What prompted you to investigate people’s acceptance or lack thereof of algorithms in decision-making?
Dietvorst: When I was a Ph.D. student, some of my favorite papers were old works by [the late psychology scholar and behavioral decision research expert] Robyn Dawes showing that algorithms outperform human experts at making certain types of predictions. The algorithms that Dawes was using were very simple and oftentimes not even calibrated properly.
A lot of others followed up Dawes’s work and showed that algorithms beat humans in many domains — in fact, in most of the domains that have been tested. There’s all this empirical work showing algorithms are the best alternative, but people still aren’t using them.
So we have this disconnect between what the evidence says people should do and what people are doing, and no one was researching why.
What’s an example of these simple algorithms that were already proving to be superior?
Dietvorst: One of the areas was predicting student performance during an admission review. Dawes built a simple model: Take four or five variables — GPA, test scores, etc. — assign them equal weight, average them on a numerical scale, and use that result as your prediction of how students will rank against each other in actual performance. That model — which doesn’t even try to determine the relative value of the different variables — significantly outperforms admissions experts in predicting a student’s performance.
What were the experiments you conducted to try to get at the reasons we resist algorithms?
Dietvorst: We ran three sets of experiments.
For the first paper, we ran experiments where the participants’ job was to complete a forecasting task, and they were incentivized to perform well. The better they performed, the more money they would earn in each experiment. There were two stages: first a practice round — for both humans and algorithms — and then a stage where participants were paid based on the quality of their performance.
In the practice round, we manipulated what forecasts participants were exposed to. Some made their own forecasts and saw those of the algorithm. Some made only their own forecasts. Some saw only the algorithm’s results. Some saw neither. So each group had different information about how well each forecasting option had performed during the practice round.
For the second stage, participants could choose to forecast the results themselves or rely on the algorithm. The majority of participants who had not seen the algorithm’s results from the first round chose to use it in the second round. However, those people who had seen the algorithm’s results were significantly less likely to use it, even if it beat their own performance.
Once people had seen the algorithm perform and learned that it was imperfect, that it makes mistakes, they didn’t want to use it. But there wasn’t a similar effect for them. Once I made a forecast and learned that I was imperfect, I wasn’t less likely to use my own forecast. We saw that effect only for the algorithm.
And for the second experiment?
Dietvorst: In the second paper, we tried to address the problem: How can we get people to use algorithms once they know that they’re imperfect?
We began with the same basic question for participants: human or algorithm? In these experiments, however, there was an additional twist. Some participants were given the choice between using the algorithm as it existed or not at all. Other participants, if they chose to use the algorithm, could make some adjustments to it.
We found that people were substantially more willing to use algorithms when they could tweak them, even if just a tiny amount. People may be unwilling to use imperfect algorithms as they exist — even when the algorithm’s performance has been demonstrated superior to their own — but if you give the person any freedom to apply their own judgment through small adjustments, they’re much more willing.
So those are the key findings from the first two papers I wrote with my coauthors Joe Simmons and Cade Massey. Following on those, I have a solo paper where I’m investigating more about why people weren’t willing to use algorithms once they learned that they’re imperfect.
Most people in my experiment used human forecast by default, which positions the algorithm as an alternative. And the way they make the decision about whether or not to use the algorithm is by asking, “Will this algorithm meet my performance goal?” even if that goal is unrealistic for human forecasts, too. They don’t choose the algorithm if it won’t meet some lofty goal.
What they should more reasonably ask is, “Is this algorithm better than me?” — which it usually is. So people fail to ask the right question and end up holding the two options to different standards.
And to what do you attribute that?
Dietvorst: That’s an interesting question. I’m not sure how this decision process came about or why people are making the decision this way. And I’ve found it’s not actually unique to algorithms.
When choosing between two human forecasters, people do the same thing. If you assign them to have one forecaster as their default and you ask them how well would the other forecaster have to perform in order for you to switch, people say the other forecaster would have to meet my performance goals, just as with the algorithm.
It seems like people are naturally making what I would call the wrong comparison.
So it’s kind of a switching cost?
Dietvorst: Not necessarily. The way I would think about a switching cost would be I’m used to using human judgment, so an algorithm has to perform X percent better or X points better than me, or a human, for me to switch to it, right?
But that’s not really how it works. People are comparing the alternative to their performance goal, rather than comparing the two options. So, the higher the performance goal I give you, the better you need the algorithm to perform in order to switch to it, even though your own performance is staying constant.
So it doesn’t seem like a switching cost, at least as we tend to think of the term.
What I find so interesting is that it’s not limited to comparing human and algorithmic judgment; it’s my current method versus a new method, irrelevant of whether that new method is human or technology.
Dietvorst: Yes, absolutely. That’s exactly what I’ve been finding.
I think one of the questions that’s going to come up is, “Well, what do I do about this? Is simple recognition of the bias enough to counter it?”
Dietvorst: If I can convince someone that the right question to ask is, “Does this algorithm outperform what you’re currently using?” instead of, “Does this algorithm meet some lofty performance goal?” and that person buys in and says, “Yes, you’re right, I should use algorithms that outperform what I’m currently doing,” then, yes, that would work. I don’t know how easy or hard it would be to get people to buy into that, though.
And in a larger organization, thousands of decisions are being made every day. Without this bias being known, there really isn’t an obvious corrective measure, is there?
Dietvorst: The studies I’ve done suggest a couple restrictions that could reduce the bias.
People are deciding whether or not to use the algorithm by comparing it to the performance goal that they have. If you incentivize people to attempt to deliver performance much better than an algorithm has shown it’s capable of, it’s not so surprising that they ditch the algorithm to chase down that incentive with human judgment — even if it’s unrealistic they will achieve it.
If you lower their performance goal, the algorithm will be compared more favorably and people may be more likely to use it.
So the problem exists in situations where the goal itself is unreasonable.
Dietvorst: Yes, if you have some forecasting goal that is very hard to achieve and an algorithm hasn’t achieved it in the past, then you could see how it would make sense, in a certain way, for people not to use the algorithm. They’re pretty sure it’s not going to achieve the goal. So they use human judgment and end up performing even worse than the algorithm.
Presumably, we’re in an age now where the quality of algorithms is increasing — perhaps dramatically. I’m wondering whether this phenomenon will make our biases more or less pronounced. On the one hand, you could see the quality of algorithms catching up to people’s reference points. But the inverse of that is the reference point will continue to move at a speed as high if not higher than the ability of the algorithm.
Dietvorst: I agree: That could go either way. But I would like to push back a little bit on this idea that algorithms are really great. The literature shows that on average, when predicting human behavior, algorithms are about 10% to 15% better than humans. But humans are very bad at it. Algorithms are significantly better but nowhere near perfection. In many domains, I don’t see any way that they’re going to get close to perfection very soon.
There is a lot of uncertainty in the world that can’t be resolved or reduced — that is unknowable. Like when you roll a die you don’t know what number is going to come up until it happens. A lot of that type of aleatory uncertainty is determining outcomes in the real world. Algorithms can’t explain that.
Suppose Google Maps is telling you the fastest route to a new place. It can’t predict if there’s going to be a giant accident right in front of you when you’re halfway there. And so, as long as there’s random error and there’s aleatory uncertainty that factors into a lot of these outcomes — which it does to a larger extent than people recognize — algorithms aren’t going to be perfect, and they aren’t really even going to be close to perfect. They’ll just be better than humans.
So what’s next? Is this an ongoing field of study for you?
Dietvorst: Absolutely. There’s a lot more to understand about how people think algorithms operate; what they think are the differences between algorithms and humans; and how that affects their use of algorithms. There’s still really interesting research to be done.