I tend to ignore the specifics of most NFL mock drafts, but I was happy to see Rotoworld’s Josh Norris recognize the importance of sample size while recently predicting that the analytical wizards with the Cleveland Browns will prefer Deshaun Watson to Mitchell Trubisky come draft day.
(We should mention that everything written below regarding Watson’s larger sample also applies to Patrick Mahomes, who had more pass attempts than Watson at a similar yards per attempt.)
To those who are familiar with statistics generally, the concept that the Browns would want a larger sample for their potential franchise quarterback isn’t tremendously difficult to grasp. But it was still a pleasant surprise to see someone in the larger draft community weave this thinking into his analysis, especially when some mock drafts still forecast the Browns to make terrible strategic decisions from an analytical perspective, including taking a running back in the middle of the first round.
We know that bigger is better when it comes to sample size, and various models, including Football Outsiders’ QBASE – developed by now Browns’ senior strategist for player personnel Andrew Healy – have shown that quarterbacks with longer college careers are more likely to be successful in the NFL. But is there a way we can truly peel apart the analysis and see the nuts and bolts of why this is so?
I wasn’t sure the best way to do it, until I came across a post by ESPN’s Brian Burke using Bayesian inference to estimate how much Jimmy Garoppolo‘s highly-efficient but small sample of NFL attempts matter for projecting his future performance. Burke didn’t give away the secret sauce of his analysis, but after playing around with some different bayesian distributions, I believe I can replicate his thinking using Bayesian updating of the normal distribution.
First, let’s go into a little background. The most common form of statistical analysis is frequentist inference, which in layman’s terms means we base our analysis and derive our insights strictly from the evidence. Where Bayesian statistics is different is that we also give weight to our beliefs before any evidence is collected. This is what we call our prior.
You can see why this would be useful in football analysis where we’re long on domain knowledge, but often short on sample size. Using frequentist inference, we often come away from an analysis without the ability to draw a statistically significant conclusion. In Bayesian statistics, even the smallest sample can be used to update our prior, but it won’t have as much influence as a larger sample.
Watson vs. Trubisky
Let’s get out of Stats 101 and use Bayesian updating to look specifically at Watson vs. Trubisky. Both players were fairly efficient passers throughout their college careers, each averaging roughly 8.4 yards per attempt. Should we assume if they both continued their college careers they’d be roughly equivalent passers going forward?
The answer is no, and it’s because Watson’s sample of pass attempts is more than twice that of Trubisky’s. Ignoring the fact that Trubisky threw almost all of his passes at an age when Watson will be playing in the NFL, we can show with Bayesian updating that Watson’s larger sample has a larger effect on our prior (an average CFB quarterback) and gives us more confidence of how efficient he will be going forward.
Below is a visualization similar to the one in Burke’s Garoppolo article with our mean (red) and standard deviations (+1 in green, -1 in blue) for expected yards per attempt. Our posterior for pass efficiency moves higher as we add more attempts at an average of 8.4 yards per attempt. The two vertical lines mark the career pass attempt numbers for Trubisky (572) and Watson (1207). You can see that new evidence does most of the heavy lifting in the first 500 attempts, so Trubisky’s efficiency shouldn’t be seen as a fluke. That said, there is a meaningful difference in our expected mean efficiency going forward, even more so on the downside as the gap is wider between our posteriors for a standard deviation below our mean. It’s also important to note that these posterior distributions apply to what we believe the quarterbacks’ true passing efficiency is in the college game, not what it will be in the pros.
Another way to look at exactly how we updated our distributions is to see how our prior normal distribution with a mean of 6.8 yards per attempt and a standard deviation of 1.0 (based roughly on the average number for a CFB quarterback in 2016) was updated for Trubisky and Watson, specifically using 572 attempts at a mean of 8.3 and 1207 attempts at a mean of 8.4, respectively.
For both Trubisky and Watson, our distribution curves are more narrow, meaning there is more certainty about how they will perform going forward than there would have been if we simply picked a quarterback at random and threw them onto the field. What you’ll also notice is that Watson’s distribution is materially skinnier and taller than Trubisky’s. Watson’s additional attempts moved his posterior expectation of pass efficiency 0.2 yards per attempt higher than Trubisky’s, and they also give us more certainty that his performance won’t fall.
Perhaps a slightly move intuitive way to compare the two posterior distributions is visualizing the cumulative distribution function. What this shows is the cumulative probability (y-axis) that the quarterback’s pass efficiency will be below a certain yards per attempt (x-axis).
Watson has Trubisky beat along almost the entire distribution, with Trubisky having an negligibly higher probability of an outlier outcome to the upside – a surprising benefit of the fact that we have less certainty in his results. What’s more interesting to me is analyzing the respective downsides. The probability that Watson’s true pass efficiency is below 8.0 yards per attempt is less than 20%, while for Trubisky it’s nearly 50%.
This analysis doesn’t come close to proving that Watson is a more efficient quarterback than Trubisky. It also doesn’t incorporate scout assessments or Watson’s less-than-stellar ball velocity numbers from the NFL Combine. But it does give a window into how sample size should guide our analysis, and why the fact that Trubisky didn’t start until his redshirt junior year isn’t just a negative narrative. It’s reasonable to say that we have more certainty on Watson’s true college passing efficiency than Trubisky’s, and that should be considered when using yards per attempt in forward looking models projecting NFL success.