Bill Snyder at Kansas State. Eddie Robinson at Grambling. Mike Krzyzewski at Duke. Gregg Popovich with the Spurs. It’s hard to underestimate the impact these coaches have had on their organizations. But are coaches always an X factor? Just look at the Golden State Warriors. Dominating as they have been under Steve Kerr’s steady guiding hand, they have been every bit as successful — actually statistically even more successful — during Kerr’s two extended absences from the team when Luke Walton and then Mike Brown (not exactly Hall of Fame coaches) took the helm. Which brings us to the question of the day: How much do coaches actually matter? Well, two researchers from the University of Chicago just might have the answer.
Paul Michelman: In 1989, Kansas State University did, in fact, have a football team, though no one would’ve blamed you if you didn’t notice. The Wildcats had failed to win a game in 1987 or ’88, they had never won a bowl game, and they were the only team in Division 1-A history with 500 losses. The athletic director at the time called the program a career stopper. A former head coach waxed poetically, “Every day there is a catastrophe.” And the man hired to turn around the fortunes of what Sports Illustrated dubbed “Futility U” could only promise this much for his inaugural season: “We will not be 0-11.” And, indeed, he was correct. In 1989, Kansas State improved to 1-10.
Ben Shields: For coach Bill Snyder, that one 1-10 season was just the beginning. At a school with no history, budget, or blue-chip recruits, Snyder did what he was brought to Manhattan to do — he coached. With no five-star recruits lining up to spend four years in the Little Apple, Snyder hit the junior college circuit. Defying the conventions of a run-happy conference, he built his offense around throwing the ball. Slowly but surely, the results trickled in — a bowl win in ’93, a Top 10 season in ’95, 11 wins and the Fiesta Bowl in ’97. Snyder had taken K-State from punchline to powerhouse in less than a decade. And his legend wasn’t done. After retiring in 2005 and witnessing three uninspired seasons from afar, he came back in ’09 — his team now playing in a stadium named after him — and turned the program around again, reaching as high as No. 2 in the country in his second term in charge. Sometimes, all it takes is one coach to turn futility into fortune. I’m Ben Shields.
Paul Michelman: I’m Paul Michelman and this is Counterpoints, the sports analytics podcast from MIT Sloan Management Review. In this episode: Just how much do coaches matter? We’re going to tell you precisely how much.
Paul Michelman: So, Ben, we opened with the story of the great Bill Snyder, who seems to be single-handedly responsible for the success of a major college football program. It’s likewise tempting to attribute Duke’s success in basketball to Mike Krzyzewski or the Spurs’ years of success and stability to Gregg Popovich.
Ben Shields: But are coaches always an X factor? Just look at the Golden State Warriors. Dominating as they have been under Steve Kerr’s steady guiding hand, they have been every bit as successful — actually, statistically even more successful — during Kerr’s two extended absences from the team, when Luke Walton and then Mike Brown (not exactly Hall of Fame coaches) took the helm.
Paul Michelman: Which brings us full circle to the question of the day: How much do coaches actually matter? Well, two researchers from the University of Chicago just might have the answer. Christopher Berry and Anthony Fowler presented their paper “Do Coaches Matter?” at this year’s Sloan Sports Analytics Conference. Here’s Ben’s conversation with Anthony Fowler.
Ben Shields: OK, today we are discussing the thesis: The right NBA head coach is worth 14 wins per season. Anthony, let’s get right to it. How much do coaches matter?
Anthony Fowler: Coaches matter a lot. We found that coaches across basketball, football, baseball, and hockey account for something like 20-30% of the variation in their team’s success.
Ben Shields: Now, that’s a fascinating finding, but it’s a much different conclusion from the existing literature on coaches, which suggests that coaches are interchangeable. What were the problems with those studies?
Anthony Fowler: For the most part, coaches have not been rigorously studied by quantitative analysts. I think most sports analytics people are interested in studying players and player performance, and they’re thinking about which players to recruit, etc. But to the extent that coaches have been studied, there have been a few studies asking whether replacing your coach is, on average, good or bad for your team. And, on average, they find that no, it looks like when you replace your coach your team doesn’t seem to improve or get worse. But, of course, that could be that sometimes you replace the coach with a better coach, sometimes you replace them with a worse coach — and on average — those effects cancel each other out. So in our study, we’re trying to answer a slightly different question, which is: How much does the natural variation in coaching ability explain the success of teams? And there we find that coaches are definitely not interchangeable — coaches do actually matter a lot.
Ben Shields: OK, so to answer that question, you developed your RIFLE model. Can you explain the model in layman’s terms? And let us know why it is a reliable methodology to study the effects of coaches on certain outcomes in sports?
Anthony Fowler: Sure, I’ll do my best. So, RIFLE is a method that Chris Berry and I developed. It’s short for randomization inference for leader effects. And we developed it with the intent of studying leaders in general — not just sports coaches but maybe political leaders, CEOs — any kind of person that plays an important role where it’s unclear what their effectiveness is. The basic idea is: We use regression, we use a statistical tool called regression, and we essentially run a big regression where we include a parameter for every coach. And then we get a measure of how well that regression seems to fit the data. So that measure in and of itself is not all that informative, because you could appear to fit the data really well just because of luck. You could appear to fit the data really well because of serial correlation in outcomes — like maybe the Patriots are just a really good team, and they’re having a really good run in the absence of Bill Belichick. And then lastly, there could be real coach effects. And so we want an inferential strategy that allows us to tease out the coach effects from those other two factors. And there what we do (so, that’s where the randomization inference part comes in) is we randomly permute each coach’s tenure as a block. So, we’re creating fake data sets, where these fake data sets have essentially the same (a lot of the same) features as the real data sets, except now the coaches don’t align with the outcomes in the same way they do in the real data. And we can see how often our fake data sets give us an estimated coach effect as big as the real data set. And if the real data set is consistently giving us a bigger coach effect than the fake data sets, then we can conclude that probably there are coach effects in this particular setting.
Ben Shields: All right, Anthony, that’s a very clear description. I appreciate that and so do our listeners. I want to probe a little bit about a key variable of player quality and how you think about that with the model. So since you’re in Chicago, I’ll mention the great Phil Jackson who, of course, won 11 NBA championships, but he also coached Jordan and Pippen, Kobe and Shaq. How do you account for player quality in your model?
Anthony Fowler: Yeah, that’s a great question. We do not make any direct effort to explicitly measure player quality. So player quality is certainly one of the reasons why it might look like a coach is effective even if they’re not. The virtue of what we’re doing is that our inferential strategy implicitly accounts for player quality in the following way. You know, Scottie Pippen and Michael Jordan are great players, and you could argue that maybe they would have won championships even in the absence of Phil Jackson there. They might’ve won whoever was their coach. To the extent that that’s true, that’s going to be true in the fake data sets, those randomly permuted data sets, as well as in the real data set. And so player quality is one of the things you might think of that is one of the reasons there’s zero correlation in team outcomes over time in the absence of coach effects. So that effect of player quality is going to be there for both the real regression and for the permuted regressions, and we’re trying to estimate the added value of coaches by looking at the difference between those two. So there’s a sense in which player quality is implicitly accounted for, even though we never explicitly measure it. The other small thing that I would add is that to the extent that players are great players, we don’t know for sure how much of that is attributable to great coaching. It could be that Michael Jordan was a great player partly because Phil Jackson was great at utilizing him and great at coaching him and great at enabling him to achieve his highest level. And so to some extent, we might actually want to attribute some of player performance to the coaches as well. And our strategy is designed to pick that up.
Ben Shields: I see, and I now understand the value of the permuted data sets quite well here, and that’s helpful in our discussion. All right, I want to return back to our thesis, which is: In the NBA, the right coach is worth 14 wins per year. What specifically is the data to support that idea? Especially since we’ve talked about the model. Now, can you kind of give us a little bit more context as to why that is true within the NBA?
Anthony Fowler: In our paper we have a strategy for estimating, as I’ve mentioned, the proportion of all variation that’s attributable to coaching effects. And so when it comes to looking at wins in the NBA, our estimate is something around 30%. And the way we got that 14 number is we asked: OK, well, what is the natural variation if we look within a particular team — partialing out home team advantage, partialing out quality of their opponent — what is the natural variation of team success? And it looks like the standard deviation of a win once you residualize out those other factors is something like 0.14. And a team will play 82 games in the regular season, so you can multiply that by 82. Then we multiply that roughly by 4. And we imagine: Suppose you’re going from a really bad coach who’s two standard deviations below the mean to a really good coach who’s two standard deviations above the mean — and that’s how we get to our back-of-the-envelope calculation that going from a bad coach to a good coach should get you something like 14 extra wins per season.
Ben Shields: There’s a lot of teams in the NBA that might like those 14 extra wins.
Anthony Fowler: Yes, for sure.
Ben Shields: Let’s shift gears from the NBA. You mentioned you studied a number of different leagues and sports. What coaching effects do you see, for instance, in Major League Baseball?
Anthony Fowler: Well, like I said at the outset, there are big coaching effects in all of the major sports. In baseball, in particular, there’s some interesting variation across outcomes. So in baseball, you see that managers seem to matter a lot more for runs allowed than for runs scored. I’m not a purported baseball expert, but that makes sense to me as a casual fan of baseball in the sense that managing your pitching lineup involves much more strategic thinking, much more complicated thinking for a manager than managing your hitting lineup — especially since the pitchers can only pitch so many innings per week, per month, etc. And so it makes sense to me that a smart manager can better utilize their pitching lineup and have a bigger effect on runs allowed than they can on runs scored. So we thought that was interesting.
Another interesting thing that we looked into for baseball is this claim made by people who study baseball that the main thing a manager does is they allocate their runs across games. So some people have thought: Well, you only have the hitters you have, you only have the pitchers that you have — but maybe what a smart manager can do, is they can say, “Well, this game is already a foregone conclusion; let’s take out our good pitchers and we’ll save them for the next game.” And so to capture this idea, we measured an outcome [we called] wasted runs, which is if you won by three runs, you wasted two runs. And if you lost, you wasted all the runs that you scored. And there we asked: Do managers significantly affect that wasted runs variable? And there the answer is no. There’s virtually no evidence that managers affect that — that’s not the main thing that managers are doing — they’re actually affecting runs scored and runs allowed, they’re not reallocating runs across gains.
Ben Shields: Yeah, that’s a very commonsense finding. I’m happy to know that the data also supports what fans might see as true with regards to how managers impact the game. I want to talk about football — what [are] some of the coaching effects you see in football as a result of your study?
Anthony Fowler: Sure. So one interesting finding is that coaching effects [are] big in the NFL but they appear to be even bigger at the level of college football. One potential explanation for that is that at the college level the coaches play all the same roles they do at the professional level, but they also are more important for recruiting players. So we found that to be interesting — the coaches matter even more in college football than they do in the NFL. And when we did study the NFL, we found again as with baseball, coaches actually matter a little bit more for points allowed than for points scored. And coaches also matter for all kinds of other outcomes. We looked at a number of other outcomes that you might think coaches could manage, like penalties and fumbles. And coaches do have significant effects on penalties committed and fumbles committed, although they do not actually have an effect on penalties committed by the opposing team. So they can clearly manage and minimize the number of penalties committed by their own team. But they’re not good at forcing penalties on the other team or that’s not something that we detect a lot of variation on.
Ben Shields: Terrific. And then I know you also studied the NHL as well.
Anthony Fowler: We did — we did study the NHL. Neither Chris nor I know much about hockey, to be honest. But we did study the NHL, and we did find coach effects in the NHL. And again, we found the coaches matter a little bit more for goals allowed than they do for goals scored. So maybe someone who’s a hockey expert can tell us why that’s true, and they can tell us what other outcomes we should be looking at in hockey.
Ben Shields: Well that’s good. I always want to give a little bit of love to our hockey analytics experts out there. It’s a small but fervent community, and we love them.
Anthony Fowler: That’s great.
Ben Shields: Yeah, we love them. All right, I want to ask you a little bit about another coach that you discussed in your paper — and that is Bill Belichick... I know you wrote about Bill Belichick in your paper. He is the quintessential NFL coach. But is there another NFL coach that you have seen in your analysis that speaks to the power of coach effects? Mike Tomlin for instance? What would the RIFLE model have to say about Mike Tomlin?
Anthony Fowler: Well, yeah, those are interesting questions and people like to ask those, and that’s partly why we added a section in our paper about individual coaches. One thing to say at the outset is that our method is not designed to estimate the effects of an individual coach or to say whether this coach is better than that coach. Our method is mainly designed to ask: What are the things that coaches are good for? And how much of the variation do coaches really explain, on average? So if you go from a bad coach to a good coach, what are you going to get? Not to say who is or isn’t a bad or a good coach. If you did want to say something about who is a good coach or a bad coach, you would have a much more difficult statistical problem on your hands, because it’s very plausible that somebody could just lock into a run of good years. Suppose Bill Belichick just happens to get hired by the Patriots right when it just so happens that everything was in place for the Patriots to become a really amazing team. We would be attributing all of that luck to his coaching skill. And so one thing that we do in the paper is we try to illustrate just how hard this inferential problem is by conducting a series of simulations where we allow for serial correlation — the same kind of serial correlation that we will have in real data. And we allow for endogenous coach retention — coaches are more likely to stick around and be retained when their teams are doing well, and they’re more likely to be fired when their teams are performing poorly — and we simulate data where there are no coach effects. So coaches don’t do anything in these simulations. And then we ask: How likely is it that you would find a coach as successful as Bill Belichick just by chance in the simulated data where coaches aren’t effective at all? And it turns out that you need to have a coach who served a long period of time before you can start to reject the null hypothesis that they were just lucky and that coaches don’t matter. And then maybe they’re just an average coach. Bill Belichick is one of the coaches for whom you can reject the null because he’s coached [and learned for] a long period of time and had an incredible amount of success over that long period of time. And so you can very strongly reject the null for Bill Belichick. He’s almost surely better than the average coach, but that is difficult to say for coaches that have only served three, four, or five seasons. If I was going to pick out one other coach that seems to be particularly remarkable in light of our simulations, it would be George Halas, the great Chicago Bears coach, who served [for] over 30 years. His winning record wasn’t as great as Bill Belichick, but if you consider the period of time that he served, it might actually be statistically more impressive.
Ben Shields: Love to hear a Halas reference from a Chicago-based resident, so I appreciate that. And also, the point is well-taken. All right, I want to pick up on a point you made earlier about the number of seasons that it takes to optimally measure an NFL coach’s effects. And you wrote in the paper that five seasons is the optimal number. This seems a little at odds to how many teams frequently hire and fire head coaches, even after short periods. So I guess the question is: With regards to your study trying to tease out coach effects, if five seasons is the optimal amount of time, does that skew the results? Only reasonably successful coaches last five seasons or more. Could the research be overstating a coach’s impact?
Anthony Fowler: Well, let me clarify one thing, which is: I don’t think we meant to say that five is the optimal period of time. I think what we might’ve tried to say is that five is the minimum number of seasons until you can be statistically confident that you have a coach that is better or worse than average. Of course, a lot of teams aren’t willing to wait it out five seasons to see if, in fact, they really do have a good or a bad coach on their hands. If they have one bad season, they might be willing to just take their chances and replace them. And so that’s just the reality of the situation. But our point was that you can never be statistically confident that you have a better-than-average or worse-than-average coach unless you’ve given them probably at least five seasons to actually have [enough] data. That’s just a fact of the world. It’s very difficult to know whether a coach is good or bad, unless you have a lot of data.
Ben Shields: I think that’s a really helpful finding that teams might consider. How do you balance giving coaches enough time to achieve results with the pressure to win now? And it’s interesting that the data suggests that five seasons is at least the minimum amount of time in order to measure the coach’s effects. All right, a couple more questions for you, Anthony. The first is a humans-and-machines question. And this is going a little bit beyond the scope of your paper, but as coaches gain access to better data and consult machines for guidance on strategies, do you think the importance of coaches will increase, decrease, or stay the same in the age of artificial intelligence?
Anthony Fowler: Yeah, that’s a great question. And that’s obviously well beyond the scope of our paper. So, I don’t have any research to support this. As you imply, you could easily imagine it going either direction. If data and machine learning and advanced analytics become so advanced so as to take out any need to have a high-judgment coach, then you’re right — then maybe that would mitigate coaching effects. Although in my experience, I have not seen anything that would look remotely like that... You know, my full-time job is as a quantitative political scientist. Lots of people have claimed for years that the big data revolution was going to somehow mitigate the need to have smart political scientists around. And if anything, it’s the opposite. I think, in general, people think of data and analytics and machine learning and all of those kinds of things as... sometimes they think of them as substitutes for clear thinking, but they’re obviously not. They only work when they’re combined with clear thinking and they’re combined with a smart analyst. And so if I had to guess, I would say: If anything, those tools are going to allow good coaches to be even better, but they’re not going to help somebody who doesn’t know what to do with them. If anything, they increase the need to have really smart analysts around and smart, clear thinkers around.
Ben Shields: OK, so humans and clear thinking still matter. That’s very reassuring. OK, my final question actually looks at some of the work you do on leaders and contexts other than sports. And I’m curious about what your work on coaches has surprised you about leadership relative to some of the other leaders in different contexts that you have studied. What have you learned about leadership as a result of studying coaches?
Anthony Fowler: Other than sports coaches, we have also studied political leaders. We’ve studied world leaders in both democracies and autocracies. We’ve studied governors, U.S. governors, U.S. mayors, and we’ve also to some extent studied CEOs. And I would say that the estimates we have for sports coaches are much greater than what we have for political leaders or for CEOs. For CEOs, we have almost no effects. And for world leaders, we have effects but they’re small effects, and they tend to be bigger in autocracies and so forth. So, I don’t know exactly what that tells us about sports. It could be that being a sports coach is probably an easier job in many ways than being a governor of a large state, [because] you are trying to optimize one thing — you’re trying to maximize your chances of winning. And there’s a relatively small set of things you can do, and you find someone who’s an expert in that sport, and they can probably make significant progress on that problem for you. Managing a large state as the governor is very complicated — there’s all kinds of other actors that are important. It’s just a much harder job. And so, it’s not all that surprising that sports coaches affect their team’s success more than, say, a governor affects their state’s economic success. Other than that, I’m not really sure what to draw off the broader conclusions. We should still do our best to try to find good governors and find good CEOs, but it could be that there’s just far less that they can do than sports coaches can do in their domain.
Ben Shields: This has been Counterpoints, the sports analytics podcast from MIT Sloan Management Review.
Paul Michelman: You can find us on iTunes, Google Play, and wherever fine podcasts are streamed. If you enjoy Counterpoints, please take a moment to rate and review the program on Apple Podcasts, and please tell your friends while you’re at it.
Ben Shields: Counterpoints is produced by Mary Dooe. Our theme music was composed by Matt Reed. Our coordinating producer is Mackenzie Wise. Our crack researcher is Jake Manashi, and our maven of marketing is Desiree Barry.