Studies show that online ratings are one of the most trusted sources of consumer confidence in e-commerce decisions. But recent research suggests that they are systematically biased and easily manipulated.
A few months ago, I stopped in for a quick bite to eat at Dojo, a restaurant in New York City’s Greenwich Village. I had an idea of what I thought of the place. Of course I did — I ate there and experienced it for myself. The food was okay. The service was okay. On average, it was average.
So I went to rate the restaurant on Yelp with a strong idea of the star rating I would give it. I logged in, navigated to the page and clicked the button to write the review. I saw that, immediately to the right of where I would “click to rate,” a Yelp user named Shar H. was waxing poetic about Dojo’s “fresh and amazing, sweet and tart ginger dressing” — right under her bright red five-star rating.
I couldn’t help but be moved. I had thought the place deserved a three, but Shar had a point: As she put it, “the prices here are amazing!”
Her review moved me. And I gave the place a four.
As it turns out, my behavior is not uncommon. In fact, this type of social influence is dramatically biasing online ratings — one of the most trusted sources of consumer confidence in e-commerce decisions.
An Example of Social Influence
The Problem: Our Herd Instincts
In the digital age, we are inundated by other people’s opinions. We browse books on Amazon with awareness of how other customers liked (or disliked) a particular tome. On Expedia, we compare hotels based on user ratings. On YouTube, we can check out a video’s thumbs-up/thumbs-down score to help determine if it’s worth our time. We may even make serious decisions about medical professionals based in part on the feedback of prior patients.
For the most part, we have faith in these ratings and view them as trustworthy. A 2012 Nielsen report surveying more than 28,000 Internet users in 56 countries found that online consumer reviews are the second most-trusted source of brand information (after recommendations from friends and family).1 According to the survey, more than two-thirds of global customers say they trust messages on these platforms — a 15% increase in four years.
But this trust may be misplaced. The heart of the problem lies with our herd instincts — natural human impulses characterized by a lack of individual decision making — that cause us to think and act in the same way as other people around us.2
On two different days in April 2013, for instance, the price of gold fell more than it had in three decades. At the time, market watchers offered all sorts of justifications as to why the metal’s price plunged so precipitously, but none of them was particularly compelling. “It is hard to escape the conclusion that gold investors sold off because other investors were selling — in other words, herd instinct kicked in,” wrote Sarah Gordon of the Financial Times.3
Social Influence Bias
When it comes to online ratings, our herd instincts combine with our susceptibility to positive “social influence.”4 When we see that other people have appreciated a certain book, enjoyed a hotel or restaurant or liked a particular doctor — and rewarded them with a high online rating — this can cause us to feel the same positive feelings about the book, hotel, restaurant or doctor and to likewise provide a similarly high online rating.
Recently, my colleagues Lev Muchnik, a senior lecturer at the Hebrew University of Jerusalem’s School of Business Administration, and Sean J. Taylor, a doctoral student at New York University’s Stern School of Business, and I designed a simple randomized experiment on a social news-aggregation website.5 (For more details about our experiment, please see “About the Research.”) On the site, users rate news articles and comments by voting them up or down based on how much they enjoyed them. We randomly manipulated the scores of comments with a single up or down vote. Up indicated the “user” enjoyed the comment; down indicated the “user” didn’t. We then measured the impact of these small manipulations on subsequent scores.
The results were alarming. The positive manipulations created a positive social influence bias that persisted over five months and that ultimately increased the comments’ final ratings by 25%. Negatively manipulated scores, meanwhile, were offset by a correction effect that neutralized the manipulation: Although viewers of negatively manipulated comments were more likely to vote negative (evidence of negative herding), they were even more likely to positively “correct” what they saw as an undeserved negative score.
This social influence bias snowballs into disproportionately high scores, creating a tendency toward positive ratings bubbles. Positively manipulated scores were 30% more likely than control comments (the comments that we did not manipulate) to reach or exceed a score of 10. And reaching a score of 10 was no small feat; the mean rating on the site is 1.9. A positive vote didn’t just affect the mean of the ratings distribution; it pushed the upper tail of the distribution out as well, meaning a single positive vote at the beginning could propel comments to ratings stardom.
These findings could help explain the online ratings bubbles recently observed by several different research teams, which some scientists have described as the “J-shaped distribution” of online ratings. It turns out that online ratings tend to be disproportionately positive. The distributions of product ratings on Amazon.com include far more extreme positive (five-star) than negative (one-star or two-star) or generally positive (three-star or four-star) reviews. Trends toward positivity have also been observed in restaurant ratings and movie and book reviews on a variety of different websites.6
The social influence bias that we observed in our experiment helps to explain these bubbles. If social influence creates positive herding but not negative herding, ratings bubbles could be caused by an asymmetry in our cognitive biases toward the prior positive opinions of others: We tend to herd on positive opinions and remain skeptical of the negative ones.
In one study that examined the skewed distribution of online ratings, researchers Nan Hu, Paul Pavlou and Jennifer Zhang also conducted a small side experiment.7 They invited students to a lab to rate a single music CD selected at random from Amazon and compared the resulting ratings to the ratings of the same CD on the Amazon site. They did this to see if the distribution of (an albeit small) random sample of actual opinions about this item (66 students in a university lab) matched the distribution of ratings given on Amazon. What they found was puzzling: The ratings from their experiment were approximately normally distributed, like a standard bell curve, cresting in the middle (reflecting the higher frequency of two-star, three-star and four-star reviews) and sinking at the extremes (reflecting the comparable paucity of one-star and five-star reviews). Meanwhile, the distribution of ratings on Amazon for the same item followed the J-shape (with the frequency of five-star reviews more than doubling that of one-star, two-star, three-star and four-star reviews).
The authors interpreted these findings as evidence that Amazon’s buyers are more likely to be positively predisposed to a product because they had voluntarily purchased it, creating a selection bias toward more positive ratings. Selection bias is a potentially good explanation for the J shape (if reviews come from purchasers and if purchasers are, indeed, positively predisposed). But here’s the catch: Amazon does not require users to buy items before rating them. So I wondered: Were prior ratings in this experiment shown to raters before they rated? The paper makes no mention of this aspect of the experimental setup. I wrote to the authors and asked them whether prior ratings were visible to users during the rating process. They replied that they were not. Examining social influence bias was not part of their study. In other words, the simulated environment they had created mimicked Amazon’s interface — with one crucial difference: The raters did not see the distribution of prior ratings, or any information on prior ratings for that matter, before they rated any item.
In this context, I reconsidered the four-star rating I had given Dojo. Would I have given four stars had I not seen the glowing reviews of other raters? I was preparing to give Dojo three stars before I saw those other reviews. I wondered: What if everyone’s reviews — not just mine — were being swayed in a positive direction as a result of social influence bias?
Many factors could be creating the J-shaped distribution of online ratings: selection bias, fraud or social influence bias. These processes may also be working in tandem to create ratings bubbles. But the key concern raised by social influence bias, in particular, is that it creates a runaway bandwagon effect: The impact of one fake positive review is not compartmentalized. Instead, it dramatically affects future ratings. That fact, on its own, is quite striking. Think of it this way: Even if websites police fake reviews and remove them, their “legacy” lives on in future real reviews whose ratings they have biased. This type of damage is very hard to undo.8
How to circumvent these human tendencies in ways that decrease the possibility for fraud and bias is a natural area for further research. At a time when online ratings systems are having a systemic and profound effect on consumer decision making, it is incumbent on scientists to learn how these social processes work and on designers to create systems that curb bias and manipulation.
The incentive to fake ratings often combines with website design (or website rules) to affect the likelihood of fraud and positive-ratings bubbles. Consider a comparison between TripAdvisor and Expedia. While anyone can post a review on TripAdvisor, a consumer can only post a review of a hotel on Expedia if he or she actually booked at least one night at the hotel through the website. An excellent study by Dina Mayzlin, Yaniv Dover and Judith Chevalier shows that hotels with a high incentive to submit fraudulent reviews (independent brands owned by single-unit owners) have a greater share of five-star reviews on TripAdvisor relative to Expedia than do hotels with a lower incentive to fake (franchise brands or chains, which benefit less from reviews and have a greater reputational risk from committing fraud).9
Site design and policy can influence ratings bubbles. In 2013, Reddit changed the design of its website to create an option for moderators “to obscure the vote counts on comments for a predetermined amount of time after their submission.” The stated goal of this feature was to “curtail and minimize the effects of bandwagon voting, both positive and negative.”10 Stock markets have implemented similar policies. For example, the New York Stock Exchange imposes “circuit-breaker” rules that halt trading at certain thresholds for single-day declines in the market, all in an effort to stave off negative herding.
Understanding the Implications
Executives — both in their lives as business leaders and as consumers — should consider taking positive online ratings with a grain of salt. While a healthy skepticism of positive ratings might cognitively correct for social influence bias, such a correction may not be necessary for negative ratings. In addition, managers should encourage and facilitate as many truthful positive reviews as possible in the early stages of the ratings process. Systematic policies to encourage satisfied consumers to rate early on could change the minds of future consumers to feel more positively toward the products or services they are rating.
Beyond that, it behooves executives to consider a few policy implications of all of this research regarding herding and ratings bubbles. Herding is a system dynamic that when seen broadly can help leaders in a variety of settings beyond the (admittedly important) scope of online ratings. For example, if equity prices work the way positive ratings do, executives should be aware of how such herding dynamics could affect a company’s stock price.
Digging deeper into the behavioral mechanisms explaining our results, we found that friends were quicker to herd on positive ratings and to come to their friends’ rescue when those friends’ ideas were poorly rated. This implies that the structure of social networks helps guide the structure of ratings bubbles. As websites like Facebook and Google encourage more social ratings — “likes” or “+1” indications by friends and more shared endorsements or “friendorsements” in advertising — the likelihood that social influence bias will propel ratings bubbles is increased.
Our experimental manipulations increased total turnout, which, when combined with the general preference for positivity on the site, pushed scores even higher. But we also found evidence that our manipulations actually changed people’s opinions, rather than simply inspiring more positively predisposed raters to rate. How do we know? We analyzed the changes in turnout (the likelihood of rating) and in positivity (the proportion of positive ratings). We wanted to know whether the ratings changes that the experiment produced could simply be explained by more positively predisposed people providing ratings, or alternatively whether more negatively predisposed people were actually changing their ratings to positive ones. What we found was that the latter was taking place. Negative raters were becoming positive raters. Opinions were changing.
Now consider this positive opinion change along with the herding effects already discussed. It is clear why our results concerned us when we thought about them in the context of large-scale, opinion-aggregation tasks in society. We happened to conduct the analysis during the 2012 U.S. presidential election. As we heard electoral poll results of likely voters on the radio, we couldn’t help but wonder: Do these types of polls predict or rather drive election results?
Many of our recent economic crises have herding and bubbles at their core — from housing to mortgage-backed securities. Understanding how herding and bubbles work is the first step toward averting their effects in a multitude of settings. More theoretical and experimental research on social influence bias and the ratings bubbles that result from it could therefore contribute not only to the management and marketing of online ratings and the user design of ratings sites but also to policies that keep human social systems from running off the rails. Policy makers and website designers should focus on understanding herding scientifically and creating policies and website designs that short-circuit herding behaviors and help prevent ratings bubbles. It is possible that such policies could reduce herding biases in everything from elections to equity markets and beyond.