Counterpoints /

The Sports Analytics Podcast from MIT SMR

Is It Possible to Judge Individual Talent in the NFL?

play Listen
« Prev Full Series Next »

When the Patriots and the Rams take the field on Sunday, February 3, at Mercedes-Benz Stadium in Atlanta for Super Bowl 53, both teams will be there on the backs of several players few expected to become as important as they are — and in spite of several others who failed to meet expectations. Or so Cade Massey would have us believe. Massey is a professor of the practice at the University of Pennsylvania’s Wharton School and the cohost of Wharton Moneyball on SiriusXM Business Radio. He argues that it’s “damn near” impossible to judge individual talent in football. We invited Cade to defend this thesis.


Ben Shields: There aren’t many traditions in sports quite like Black Monday, when the NFL’s carousel of firings and hirings kicks into full gear. The candidates always range far and wide, but the most trusted route teams take is to pluck a disciple who learned from a legend. The number of coaches and GMs in the League who apprenticed under a man named Bill: Walsh, Parcells, Belichick — take your pick — seems to grow every season. But there’s a tree of talent evaluation that sometimes gets lost in the shuffle: the Green Bay Packers of the 1990s. Behind Hall of Fame general manager Ron Wolf were four future GMs: Ted Thompson, John Dorsey, John Schneider, and Reggie McKenzie. Not to mention Mike Holmgren, Andy Reid, and Jon Gruden on their coaching staff. That unit’s most memorable play? Snatching a benchwarming quarterback named Brett Favre away from the Atlanta Falcons and turning him into the NFL’s preeminent gunslinger with a Super Bowl ring to show for it.

Paul Michelman: But while Wolf and company were turning Atlanta’s trash into Green Bay’s treasure, a different Hall of Famer slipped through the cracks. The Pack invited four quarterbacks to training camp in 1994. The one they decided to cut? An undrafted rookie from northern Iowa by the name of Kurt Warner. It might not have been a big deal when Favre was winning MVPs and Warner was bagging groceries, but if any of those Green Bay decision-makers had recognized the kind of talent they were sending away, their trophy case may have been a little fuller the following decade. Warner made three Super Bowl trips in the 2000s, including a pair of playoff wins over Green Bay along the way, and showed quite clearly that even one of the best groups of football minds ever assembled could fail to see potential greatness right in front of their eyes. I’m Paul Michelman.

Ben Shields: I’m Ben Shields, and this is Counterpoints, the sports analytics podcast from MIT Sloan Management Review. In this episode, just how hard is it to gauge a player’s chances of success in the NFL? Well, it’s almost impossible.

Paul Michelman: OK, full disclosure: We are taping this episode before the matchup for Super Bowl 53 has been set. Even so, it is very likely that whoever takes the field at Mercedes-Benz Stadium in Atlanta on Sunday, February 3rd, will be there on the backs of several players few expected to become as important as they are, and in spite of several others who failed to meet expectations.

Ben Shields: Or so the theories of Cade Massey would have us believe. Massey is a professor of the practice at University of Pennsylvania’s Wharton School, the cohost of Wharton Moneyball on SiriusXM Business Radio, and a longtime student of football and performance. He argues that it’s damn near impossible to judge individual talent in football. We invited Cade to defend his thesis.

Paul Michelman: Cade, welcome to Counterpoints. Thanks for joining us.

Cade Massey: Thanks for having me. I’m delighted to be here.

Paul Michelman: So, Cade, I can imagine a few NFL personnel people might take exception to your argument. Explain yourself, sir.

Cade Massey: Well, there’s lots of great folks doing personnel in the NFL. They’re very good at their job, but it’s a very difficult job. Football by its nature is very much a team event. Performance is highly interdependent, and it’s not that these guys are bad, it’s just that they’re humans, and humans are not well-wired to properly consider context and situation when assessing performance. And it’s so interdependent in football that it’s that much harder to do. So, we are forever watching a running back or a wide receiver or a linebacker and focusing on that person exclusively, thinking we can infer how well he is doing — when, in fact, his performance depends on the people around him — on the coaches, on the sidelines, on the training staff before the game. And yet we don’t see those things, so we neglect them, and we make a mistake in how we value him.

Ben Shields: Cade, we understand this idea in theory, but we want to dig into some of the core evidence to support your argument. What’s the data here?

Cade Massey: Well, we’ve got some data. We get studies here and there — and some come from sports, and some come from outside of sports — but it’s also one of these areas that I’m hopeful given the revolution we’re seeing in analytics and technology that we’ll have better data going forward. Can I start with an anecdote? Does that count as data? One of my favorites: the 2006 draft. USC had just lost the national championship game against Texas, and their quarterback that year was Matt Leinart. Matt had (he and the USC) had won the national championship the year before. Leinart was drafted with the 10th pick in that draft, and right after him Jay Cutler, a quarterback out of Vanderbilt, was drafted at number 11. And it raises this interesting question: If these two guys are assessed to be essentially equal prospects coming out of college, where do we want to place our chips? When they come from such different contexts? They played with such different teammates. Matt Leinart played on one of the most star-studded teams of all time. There were 11 — he plus 10 other Trojans were drafted in that draft. Eleven USC Trojans were drafted in 2006 alone. Jay Cutler came out of Vanderbilt. They didn’t win many games while he was there, and nobody else from his team was drafted. When you know that, where do you want to put your chips on which player is going to turn out better in the NFL? Now, this is a fun anecdote to give because they work out very differently. Leinart played a couple of years, and then Cutler had a full career [and was a] high-profile quarterback for a long time in the NFL. The thing about this ending: We called this one when it happened. These guys are assessed to be equal. They come from very different teams. They’re surrounded by very different levels of support. If they’re assessed to be equal by all of us flawed humans, yet they come out of such different contexts, the better bet would be the guy who did that well, who generated that much excitement coming from nothing essentially, as opposed to the guy who was surrounded by so much future NFL talent.

Paul Michelman: So, that’s a fun anecdote, and in and of itself certainly shows the difficulty of the situation. But how extendable is that anecdote? How many more stories like that can we point to in order to kind of make this case?

Cade Massey: Well, at some point you want to do more than just talk anecdotes, and some of the best places where this has been studied so far have been outside of sports, actually. So, there’s a famous study by a professor at Harvard named Boris Groysberg of financial analysts. And you might not know it, if you’re not in finance — but financial analysts become kind of rock stars whenever they seem to be very good at their job. They call earnings well, they call future successful organizations correctly. And banks poach the stars from other banks. They poach the up-and-coming analysts. And so, Groysberg goes in and studies how they do. And what he finds is that generally the stars that are poached and brought into another firm don’t do as well. The exceptions are when they hire, not just the analyst himself or herself, but the team around the analysts. These people typically have a staff around them — they might have some support system from the organization as a whole — when they bring not an individual but a broader group over, they have more success. The other exceptions are people who manage to do really well but from lesser organizations. They do it without the support system. They do it without the strong teammates around them. Those people succeed at a higher rate after a move. So, this is, more or less exactly analogous to the NFL draft in the non-sports world and in a much broader sample.

Ben Shields: Well, on the football field, Cade, does this argument hold across all positions? So, can’t we judge quarterbacks, for example, as individuals better than we’re able to judge, say, linemen — whose work is always a collective effort?

Cade Massey: Yeah, I do think it varies. Clearly, some positions are more dependent, but they’re all interdependent to some extent. And the quarterback, you know… By the way, it’s not always that the good players are kind of free riding on their teammates. It goes the other way sometimes, too. Truly great players make their teammates better. And so, to some extent, they’re not getting the full credit they deserve. So, consider the impact a great quarterback has on the receivers, for example, or on running backs. If he can draw a lot of attention to himself, it frees up some space for the running backs. So, just to be clear, it does go both ways. I agree that some positions are tougher to assess up. There are a couple of studies that show the effect though in different places. So, there was a small study, just a casual analysis, but it was an empirical analysis by a PhD. The fellow’s name (he’s an economist) is Andrew Healey. This was four or five years ago. Andrew has gone on to actually work inside the NFL, but at the time he looked at the impact of teammates by looking at the relationship between receivers and quarterbacks. And what he found was that a receiver who played in college with a quarterback who was drafted in the first round was drafted about a round too high — if you looked at his long-term performance. So, this is a classic example of — look, it takes two players plus a lot of support around them. It takes more than two players, really, it takes 11 players — but it takes, obviously, two players to throw and catch a ball. And if you do a lot of that, how do you really know who’s more responsible for the success? Is it the quarterback throwing it? Or is it the receiver catching it? Never mind the offensive linemen blocking it. So, he goes in and looks at the data — a bunch of receivers who are drafted — these are highly drafted receivers who played with highly drafted quarterbacks, and he determines that they underperform expectations. The better the quarterback they played with, the lower they perform, and the impact is significant. They’re about a round overdrafted. The other one that I’ve looked at — this is now just work that I’ve done. You mentioned the offensive line. It’s almost the canonical example of interdependent, right? You’ve got a right guard who’s got to do something in close quarters with the right tackle to his right and the center to his left. How can you really assess his individual performance? The thing is people do try to assess individual performance. These guys are given individual grades, and yet they’re doing this team thing. They’re doing this highly interdependent task. I’ve got a season’s worth of these grades on offensive linemen, and I looked at the grades for the top five offensive linemen for each of the 32 NFL teams. So, I’m basically looking at 160 season-level grades for all the top offensive linemen in the league. How much of the variation do you think is explained by team affects alone? So, these are supposed to be individual grades, but what you see is that they’re clumpy, that the players for a particular team will tend to be graded high together or low together, which means they’re not really individual grades, that there are these group effects that are baked into the individual grades. The amount is something like 27%. So, about a quarter, a little bit more than a quarter, of the variation in these supposedly individual grades actually comes from team effects: the people they’re surrounded with, the staff that trains them, the coaches that deploy them.

Ben Shields: Cade, I want to ask about the combine and how NFL decision makers should evaluate, for instance, the 40-yard dash time of a particular player as opposed to how they think that player might fit in within the team scheme or system. How should NFL decision makers think about data from the combine?

Cade Massey: Well, it’s a big question. We could do a whole show on that one. It’s a lot of fun, and it’s something that analysts worry about, teams worry about. It’s kind of an institution now that combine — the whole league gathers there. A lot gets done in Indianapolis every spring, outside of just clocking people in the 40. And by the way, clocking people in the 40 is kind of a tradition now — it’s kind of sport. There’s some entertainment to it, and people do pay attention to it, but it’s just one input. Because it can be measured so precisely, and because we’re all reasonably calibrated for what a fast 40 time is versus a slow 40 time, it probably receives disproportionate weight in the final decisions. I mean we can measure this thing to the hundredths. We know the difference, personally, between a 4.5 and a 4.4, and so because of these things, it’s going to get a lot more weight than say some other things like how good his hands are or how good a route he runs, even though those things might matter a lot on the field. But, because we can write it down to the hundredth decimal place, it gets too much weight. So, one anecdote for you: It’s kind of like the Leinart-Cutler thing. And this is something I know across a broader sample. If you take two receivers who are drafted at the relatively same position in the draft, and they have different 40 times, who do you think has the better career in the NFL? The slower guy — the slower guy does. Because if they’re all else equal — not all else equal — if the package is considered equal, and we think that 40 times receive too much weight, it means the other stuff’s not getting enough weight, and the other stuff matters in the long term. So, if two guys are evaluated to be the same, but one is slower than the other, he’s got some other things going for him because we know 40 times receive too much weight, he’s going to end up doing — on average — he’s going to have a better career.

Paul Michelman: Do you think that we’re measuring the wrong things or putting disproportionate weight on the wrong things? Or is it that you simply can’t measure some of the things that are determinants of performance?

Cade Massey: I think all those things are true. This is why this is such a fun thing to work on and why it’s never going to be done, because you need to make progress on all three of those fronts. One, I think we don’t acknowledge enough the challenge of this task. We think we can see individual contribution more clearly than we actually can. It’s just a fundamentally hard task. But then I also think we don’t have the right measurements. We’re not capturing the right things right now. Honestly, how are we supposed to measure offensive linemen performance other than to watch them with our eyes? Well, now we’re coming up with some new technologies that might allow us to do that. So, motion tracking, which has revolutionized already, to some extent, basketball — at least it’s on the way to revolutionizing basketball [and] it’s going to become the most important way of doing analytics in soccer — eventually will be the way football analytics is done. With motion tracking, we’re going to be able to know much more precisely what an individual player does. Take an offensive lineman, for example, how much space [does] he preserve for the quarterback if he’s trying to pass block? How much space [does] he create for the running back he’s trying to run block? Literally measuring the displacement. And you could even go to more complicated things like, how hard was that task? What kind of impediments did he have in front of him, compared to what his teammate next to him had in front of him? We just don’t have those measures right now. We’ve been trying to do it with our eyes for generations, and in the future, we’ll be able to measure it much more precisely.

Ben Shields: Cade, you bring up an excellent point about how the introduction of motion-tracking technology is going to revolutionize NFL decision-making. I want to drill down a little bit more on that. How specifically do you think that’s going to change how general managers put together, maintain their rosters going forward?

Cade Massey: Well, mostly I think we don’t know. Truthfully, I think we don’t know. There’s a little bit of a “We need to go to the moon.” “Well, why do we need to go to the moon?” “Yeah, I don’t know, but we’re going to discover a lot of interesting stuff along the way.” That’s a little bit the argument with teams investing in motion tracking right now. It’s we don’t know what we don’t know. This is a methodology that’s going to open up a wide range of possibilities, and we’re going to be asking questions two years from now that we can’t conceive of right now, because it’s going to be introduced to us by this new technology. So, I think it’s really hard to jump to how the roster will be constructed differently. If I had to speculate… You know, you talked about the 40 times coming out of the NFL combine. These are ways that football players have been assessed for decades. You know, it’s a fancier stopwatch than it used to be, but it’s still a stopwatch, and we’re about to have a whole different set of metrics on players. And they’re going to displace these traditional measures, because 40 time doesn’t matter. What you need to know is we’re going to know the clock speed of the guy in pads with the football. We’re going to know a defensive back’s ability to maintain, you know, contact with the receiver as he moves down field. So, the 40 time was always just a proxy for those on-field performance metrics that you actually care about. And with motion tracking, we’re going to be able to go directly to those things. So, I don’t know what that means for the kind of player that makes the roster. It’s just that there are going to be better decisions. The teams that have those insights are going to be able to make better decisions as a result.

Paul Michelman: Well, though, are they? Because how much are we advancing against what you cited as the root problem at the beginning, which is [that] individual performance in football and in other contexts is so determined by the context of the system and the people around you? How is this new data actually going to advance that?

Cade Massey: Yeah, it’s great. It’s great. That’s exactly right, and that’s probably like second-generation motion-tracking usage. You know, you can come up with some examples though. So, I’ll give you a small example from soccer. In soccer, it’s Messi in soccer, but they’re a little bit ahead, so we might learn something from them. There was a great paper at the MIT Conference last year by a fellow named Luke Born, and he’s worked in soccer and now he’s working for the Sacramento Kings. His coauthored paper was one of the five finalists last year, and they talk about space essentially in soccer. They go out and look at space, and they talk about — one of the things that is going on in the pitch and soccer is that players are positioning themselves to create space for other players, and they talk about Messi’s ability to create space even by walking. This is one of the new little anecdotes in the paper. It’s that when he’s walking, it’s not just that he’s tired, he’s actually walking in at a particular moment, at a particular spot, in a particular direction that maximizes the space on the pitch for the rest of the team. So, that’s a really neat concept, and it’s important in soccer, but it’s also important in football. If you talked to coaches in football, a lot of what they’re playing and managing and trying to influence is space. So, consider for example, just the space behind the defensive line where receivers are running, and defensive backs are trying to cover them. If you have a really good individual defensive back, you can put him on the best receiver, and then he’ll cover that receiver one on one, and the rest of the guys back there, whether it’s three or four or five guys, they can be deployed to cover the rest of the space, and you don’t worry about him. And this is what the real luxury of having a shutdown corner is — that you can do that, and it frees up resources for the rest of the field. That is something that’s hard to quantify right now. You might know that some teams use guys, but you don’t know exactly the impact. Consider what you can do with motion tracking. You can imagine if you start evaluating this space as receivers run their routes and defensive backs try to cover those receivers. You can imagine the impact. You can assess the impact an individual player has on the space he can cover and therefore the space that he kind of gives — the extra resources he gives — to cover the rest of the space. You know, if you and I are out there in the defensive backfield, what do they have to do? They have to keep the guys really close to us, right? Because we can’t cover much space, but the best receivers, the best defensive backs take care of more territory. Right now, we’re not valuing those guys properly. I believe the motion-tracking technology, once we get used to it and once we push it along a little further, will allow us to answer some of those questions and to quantify some of those impacts.

Ben Shields: Cade, in the interim, I do want to bring up one other case and that is of Le’Veon Bell. If you are advising teams on whether or not to sign Le’Veon Bell, what are you advising them on? How would you help them make that decision or not?

Cade Massey: Well, let’s set aside the politics and the drama of it. You know, this is a guy who’s held out — and so, you may not want to sign somebody who’s caused that kind of disruption before. But setting aside that, many people considered him to be the most valuable or one of the most valuable running backs in the league coming into the season. There’s been a growing awareness that running backs are… An extreme version, is that they’re a commodity and that’s too extreme. But I do think that that wisdom is grounded in some truth. And I can take it back to the paper I wrote with Dick Thaler on the NFL draft. When we broke down our results by position, one of the things we observed is that rookies — when you draft rookies — the expected value of the player is positive. So, the compensation is kept low enough that they, you know, even though they’re risky prospects, they’re good enough that they, in expectation, outperform what you have to pay them — everywhere in the draft and for every position in the draft, except for running backs at the top of the first round. Literally, there’s one position at one location in the draft where they’re not a positive expected value proposition. And that was one of the kernels around which this conversation grew, that said these guys — we’re probably paying too much, and we’re probably putting too many chips in these running back baskets — especially at the top. And this is related to what we’ve been talking about, because a lot of a running back’s value depends on the system that he’s in and the offensive line and the ability of the quarterback and the passing game to provide an alternative threat. So, you take a guy like Bell, and you put him into a system that doesn’t have all the weapons that the Steelers have, and he’s going to look a lot more pedestrian. You take a guy who looks relatively average from a system that’s less loaded than the Steelers, drop him into the Steelers’ system, as we saw with their backup running back this year, and he’s going to look better.

Paul Michelman: So, as we look towards a future where individual performance and our ability to assess it may improve, how much do we simply need to accept the limits of how well general managers will ever be able to assess individual performance in the NFL? And if we do accept that, where does that lead us in terms of how teams are managed?

Cade Massey: Great questions. Really great, important questions. The job these guys have is extremely hard. Forecasting the NFL performance of these college players is highly uncertain, and there is definitely a limit to how well you can do it. Even with the next round of technology, there’s always going to be real limits to how well they can be forecasted. And I do think it’s precisely by accepting that uncertainty and those limits that you can then pivot to a better way of managing things. So, for example, I have done the research, and I have shown and I have argued that differences in the NFL draft are largely by chance. The teams that do really well in a particular draft typically don’t do well the next year, which is what they should do if it was a skill-based exercise. Instead, teams bounce around and the more picks you observe by a regime, the more they trend towards average. And so, it’s not that these guys aren’t good at the job, it’s that they’re all equally good at their job, and it’s really hard. The point of it is you can’t have a persistent advantage by picking people more effectively. You can’t be more accurate in the draft. That’s not going to give you the consistent edge that you need. So, if you can’t get an edge that way, how else can you get it? Well, you probably can’t get it by signing free agents, because free agents go for full-market value. The market’s relatively efficient. So, it’s hard to get it that way. So, you’ve got two other alternatives. One is to manage your own free agents more effectively, better assessments of who you keep and who you let go. And we’ve seen some teams over the years get really good at this, and they can be kind of callous, but it may be a necessary part of being successful in the NFL, by knowing when they should let people go. And the other way, and this is kind of the least developed and probably the most important frontier, given how difficult it is to assess people — given how difficult it is to have a persistent edge in that assessment — is to develop the players that you have. We tend to think these guys are fixed, like we’re just going to identify the, you know, the true value, and then we’re going to wash him off, and he’s going to have that true value, and that’s what we’re going to, you know, reap for the next few years. Instead, they’re still malleable, and they can be improved or left to kind of coast, depending on the system they’re in, depending on the staff they’re surrounded by, depending on the strength and conditioning programs they’re put into, depending on the culture of the organization, depending on the teammates that they’re around, depending on how well they’re understood as an asset, and how that asset should be deployed on the field. These are all player development variables that can hugely impact a player. And if you look across the league, some teams, despite all of the uncertainty, despite all of the randomness, despite all the difficulties in finding edges, some teams tend to be there year over year, and it’s got to be coming from somewhere. One of the places it comes from is that some teams are better at that player development than others.

Ben Shields: All right, Cade, to that point — and we’re not asking you this question because we are located in Cambridge — but: How do you explain the Patriots?

Cade Massey: Well, the easiest way to explain the Patriots, is Brady and Belichick. And you know, since we started this talking about the impossibility of assessing individual performance, there’s probably no better place to land than the eternal quandary of Brady versus Belichick. We’ve seen a little bit of Belichick without Brady. And he’s acquitted himself well, right? But we haven’t seen Brady without Belichick. It’s really hard to parse those two. But you’re talking about an all-timer at the two most important positions in the building — the coach and the quarterback — and these guys are first-ballot Hall of Fame guys, and I don’t think you need to go a whole lot further than that. We can go further than that: We like lots of things that Belichick does, and we can kind of unpack some of the things he does, but as I just said, you know, you’ve got to know how to manage your own free agents, when to let people go. And they seem to do that well. One of the things they do really well, they know what complementary pieces are needed for their team. So, for example, they have really demonstrated… over the years, they have demonstrated to the rest of the NFL, the value of these little slot receivers, and there’s a fit between some of Brady’s greatest strengths as a quarterback and having these guys in the slot that he can hit, and they build some of their offense around them. So, it’s that kind of complementarity, this again is one of the things we’re talking about. This is why it’s so hard to assess individual performance in these team sports. Wise coaches are deploying very complementary assets. That means there’s an interaction between the quality of Julian Edelman and his quarterback Tom Brady. There’s an interaction… Brady’s a great quarterback. Edelman’s a good receiver, but there’s also an interaction between them, and it’s not by chance that interaction. Belichick is good at identifying those interactions.

Paul Michelman: I think if Belichick is smart, he’ll leave when Brady leaves. So, this question will remain eternal.

Cade Massey: That’s right.

Paul Michelman: Cade Massey, this has been terrific. Thanks so much for joining us today.

Cade Massey: You bet. Thanks for having me — I appreciate it.

Ben Shields: This has been Counterpoints, the sports analytics podcast from MIT Sloan Management Review.

Paul Michelman: You can find us on iTunes, Google Play, and wherever fine podcasts are streamed. If you enjoy Counterpoints, please take a moment to rate and review the program, and we’ll graciously accept your constructive criticism, too.

Ben Shields: Counterpoints is produced by Mary Dooe. Our theme music was composed by Matt Reed. Our coordinating producer is Mackenzie Wise. Our crack researcher is Jake Menashi. Our maven of marketing is Desiree Barry.