Predictive Receiving Stats (2021 Fantasy Football)

I have good news and bad news for you.

First, the average depth of target doesn’t correlate with fantasy scoring.

Second, fortunately, other statistics do in fact correlate to fantasy scoring.

It's time we change the way we think about our stats.

Look, change is hard. Old habits are hard to break. It’s tough to teach an old dog new tricks. Change doesn’t happen overnight. Hell, Rome wasn’t built in a day.

I get it, I really do.

In fantasy, you want to find receivers who get yards because they directly imply fantasy points, and that is exactly what aDOT tells you.

Right? Right.

But the goal in fantasy football is to find players who are going to score the most fantasy points.

Right? Right.

Wouldn’t it then make sense to find out if the metrics we use to grade prospects are correlated to fantasy scoring, or perhaps if they’re not? A scientific test, if you will. And if you were to find that a metric, average depth of target, for instance, were not particularly well-correlated to fantasy scoring, would it be enough for you to change your perception of players sorted by that metric?

Well, let’s find out.

To start, let’s employ the receiver database provided by @Cooper_DFF (seriously, go buy that man a coffee) which includes all NFL receiving data from 2009 through 2020. Finding correlations between two particular metrics is as easy as pie. For starters, you can simply do a scatter plot chart, charting one metric versus the other. Right before your eyes, you should be able to see if there is a correlation, which would have a tight linear relationship between the two (it should look like a straight line.) The less “noise” there is in the pattern, the closer the correlation between the two stats will be.

You can see this type of strong correlation if, for example, you chart point per reception fantasy production versus their standard scoring production, as shown in the image below.

See how the scatter plot aligns neatly along with the linear relationship between the two metrics defined by the trendline? That R2 value listed on the chart is called the coefficient of determination, or the percentage of the variation in one metric as explained by the metric you’ve charted it against.

Don’t worry if that doesn’t make sense, the take-home point here is that that value will vary between zero (for completely uncorrelated data sets) all the way to 1 (for perfectly correlated data). You can see the R^2 value for this correlation is nearly perfect at over R^2 = 0.97. You can think of it as a percent, from zero to 100% correlation.

Compare the above plot to the following chart of two statistics that show only a slight correlation, such as yards per reception shown here charted versus receptions per game. The lack of correlation should be obvious to the naked eye, but the R^2 value of 0.003 really emphasizes the lack of correlation in black and white, and the R^2 is a value we can directly compare to that of the previous correlation between Standard and PPR scoring per game which had a very impressive R^2 = 0.97.

You can perform this kind of correlation between any two sets of data, it’s not particularly difficult to do.

However, employing just a bit more advanced calculation called a Correlation Coefficient (R) and you can establish a better apples-to-apples type of metric to determine the degree of correlation between two data sets. While a bit more complicated to calculate than the R^2 value, the correlation coefficient (R) isn’t hard to compute using something as easy to access as Excel, even a plebian like me can do it. The correlation data that I’m using for this study are all correlation coefficients (R), which is simply a better way to correlate two data sets.

Still, I do love my charts, and I find the human eye’s ability to size up the correlation in a scatter plot is oftentimes more elucidative than simply comparing correlative metrics, so I’ll often find myself charting various data sets to see if my eyes can spot a relationship the raw numbers didn’t indicate to me.

Yeah, I’m a geek, I get that too.

You can see the scatter plots in the image below that I made from the receiving metrics in the aforementioned database. Each pane in the following image shows individual receiving stats charted versus standard fantasy points per game. The R^2 value is listed on each chart to give you a numerical concept of the degree of correlation evident with this cursory investigation.

Sigh, yes, I know nobody plays standard anymore (except my home league), but as I’ve already demonstrated in the first chart, by and large you cannot score a lot of standard fantasy points at the receiver position without also scoring a lot of PPR fantasy points. That’s literally what that first chart is telling you.

For what it’s worth none of the math I've looked at seems to be impacted by the change to PPR in the correlation calculations. So, I think it’s safe to apply this same logic to leagues with standard, half-PPR, or full-PPR, without much reason for concern about the impact of the “normal” scoring formats on the general conclusions of this study.

Additionally, I'm trying to investigate the link between stats like receptions per game and fantasy scoring. Based on the way PPR scoring is calculated, you can expect that receptions data will already be correlated to fantasy scoring. In an effort to actually measure the contribution of receptions and not double-count the correlation between receptions or reception efficiency data, I’m correlating to standard fantasy scoring instead. I’m also using scoring on a per-game basis rather than as a season-long number, as so many players miss games throughout the season for one reason or another, and I did not want missed games to impact correlation metrics.

In a Twitter discussion with none of than JJ Zachariason, Mr. @lateroundqb himself asserted that my correlations (or rather my lack of correlations) could be invalidated because I included the entire data set into my analysis. An assertion I immediately felt was plausible. The argument is that lower volume players could unduly influence the data, particularly as it pertained to the yardage-implied statistics (Ex: yac, ypr, ypt, or aDOT) for whom the volume is still required to produce high-end fantasy scoring. After all, if you’re getting yards and volume, the points should therefore follow; or so it goes.

So, in order to not have lower volume players influence the data, I filtered to exclude players who received less than four targets per game on average in any given season.

I then calculated the Correlation Coefficient (R) for each of the receiving metrics as correlated with standard fantasy scoring per game.

This provides us with some actionable data (I’ll share the results in a bit) about the degree of correlation between the various metrics and fantasy scoring. But, correlation to fantasy scoring is only half of the story. Critically, how predictive a stat maybe for fantasy purposes depends largely on how relatively volatile or sticky that stat may be over the course of a player’s career in addition to its correlation to fantasy scoring.

To look into this issue of receiving statistic stickiness/volatility, I went back to the receiver stat database and began to sort the players by career length. My goal was to find players with the longest careers so that I could average out the degree of year-over-year volatility for each player at each individual statistic, and then compare that career average volatility for each stat. The longer careers simply provided the largest sample sizes to establish that player’s relative stability/volatility at any particular stat, in an effort to eliminate as much noise as possible.

For this study I used players with 10-year careers, of which in my sample I found 21 players (including tight ends) listed in the table below:

Pierre Garcon

Julio Jones

Demaryius Thomas

Emmanuel Sanders

Rob Gronkowski

Larry Fitzgerald

Antonio Gates

Greg Olsen

Vernon Davis

Jimmy Graham

Delanie Walker

Jason Witten

Randall Cobb

Kyle Rudolph

DeSean Jackson

Golden Tate

Ted Ginn

Michael Crabtree

Danny Amendola

Jared Cook

Zach Miller

I then calculated both the standard deviation of each player’s career performance at each statistic, as well as the mean (or average) value for each player’s performance at each stat over that player’s career, the ratio of the two defining the Coefficient of Variation (CV) for the player over their career at that stat.

Whew! Got all that?

By averaging the CV for each stat across my 21-player sample of 10-year careers, I was able to then derive what is at least a rough idea of the degree of stability/volatility one might expect at each stat, as indicated by the average career year-over-year Coefficient of Variation for each stat.

Is your head spinning yet? Admittedly, having just written all that garble, mine is quite frankly spinning like mad.

However, that’s all just the housekeeping. This is what you need to know in a take-home sense:

  • First, the average of the career values of the Coefficient of Variation (CV) for each statistic is a solid estimate for the degree of stickiness for each of the receiving stats in the study. Lower numbers indicating lower volatility or stickier stats.

  • Second, a high degree of correlation between a stat and fantasy production should be itself be highly predictive; that is provided that statistic is also inherently sticky.

  • Critically, it is the interplay between a statistic’s correlation to fantasy scoring and its stickiness that ultimately determines its predictive merit, and therefore its relative importance to us as fantasy football managers.

Consider the following image that shows the Coefficient of Variation for the receiving statistics in this study charted versus that stat’s Correlation Coefficient (R) with standard fantasy points per game.

On this chart, data points located further to the right on the chart represent statistics more highly correlated to fantasy point production, whereas points on the left demonstrate a poor correlation to fantasy point production. Points high on this chart demonstrate a high coefficient of variation (CV), and points lower on this chart represent lower variation and thus represent performances that are likely to repeat themselves.

In this image, the stats with the most predictive merit are those furthest to the lower right.

You can see from this image that counting stats like receiving yards, total yards and receptions, receptions for first downs, and targets are all certainly correlated with fantasy scoring. They’re just more volatile than you’d like to see to have true predictive merit. Receiving touchdowns were the most variable of the sample. This shouldn’t be new information for seasoned fantasy veterans.

However, the efficiency stats such as yards per game, touchdowns per game, targets per game, first down receptions per game, and the combo stat first down receptions per touchdown per game, are all clustered in that lower right quadrant, indicating not only are they well-correlated to standard points per game but that they are amongst the least volatile stats in the study as well. As such, they should be your focal point for predictive analysis in terms of receiving stats.

This is as compared to something like yards after the catch, which demonstrates a decent correlation to fantasy scoring, but given the relative lack of correlation as compared to the aforementioned efficiency and counting stats, it’s simply not nearly as valuable in terms of its predictive merit for projecting fantasy scoring. This is especially true given the particularly volatility of the statistic. Though it’s worth noting that I found a distinct correlation between YAC and NexGen stat Yards of Separation (R = 0.4692). It’s also worth noting that that correlation is not great, but it is a better correlation than the trendier set of implied-yardage receiving stats located in the lower left of this chart above are demonstrating to have with fantasy scoring.

Despite their recent rise in popularity, I found little correlation between fantasy scoring and yards per target, yards per reception, or even average depth of target for that matter. Even in this sample of players getting regular volume. Additionally, look at the scatter plot of aDOT versus fantasy points per game above. The eye should be able to see that the math isn't lying, that it is not a stat that is well-correlated to fantasy points per game.

Often touted as the next great efficiency stats, they do demonstrate great stickiness in this study. But as so many wide receivers from Jerry Rice to Deebo Samuel have demonstrated, you can have a low aDOT and still score a lot of fantasy points. In addition, as the likes of Denzel Mims (6.5 PPR/G at 15.4-yard aDOT over nine games at 4.9 Targets per game) or Tim Patrick (5.4 PPR/G at 10.1 yards aDOT over fifteen games at 5.3 Tgt/G) or even KeKe Coutee (5.9 PPR/G at a respectable 7.6 -yard aDOT over eight games at 5 Tgt/G) demonstrated just last year; a high aDOT and reasonable volume do not guarantee fantasy scoring. This is literally what the charts above are telling you.

This is as compared to Cooper Kupp who in 2020 scored a blazing 13.8 PPR/G on a paltry 6.3 yards aDOT or JuJu Smith-Schuster who scored 14.6 PPR/G on a ridiculous 5.49-yard aDOT, with both players ranking well in terms of YAC. Keenan Allen’s 17.7 PPR/G is most impressive when you consider his 7.16-yard aDOT and the supposed limitations therein.

So, it seems the implied yardage stats, YAC, Yards per Target, Yard per Reception, and aDOT, despite their inherent stickiness, are simply not very predictive.

In discussions with individuals with which I’ve shared this information prior to publication, I was pressed to include air yards in the study as this is certainly a distinct statistic from that of aDOT, YPR, or YPT. While I do not have air yards in my year-over-year analysis, it did perform slightly better than the other yardage implied receiving stats with a Coefficient of Correlation of R = 0.45 with standard fantasy scoring per game, with the data set once again filtered to exclude lower-volume players. The correlation between fantasy scoring to air yards actually improves (R = 0.58) if you include the entire 2009-2020 data set, including low-volume players, which puts it close to yards per target in terms of correlation to fantasy scoring. For whatever that is worth.

So, what have we learned? Well, we have learned that chasing volume is more likely to net you fantasy points than chasing yardage-implied stats. And if it’s fantasy points you’re looking for, key in on efficiency stats. As with most correlative stats, this is particularly true for players that already have the volume as the production will eventually follow…of course, some stats have volume baked-in, and importantly, some clearly do not.

Additionally, for those of you living that DFS life, consider these statistics when constructing your optimizer, trying to create projections algorithms, or setting your lineups. It’s not rocket science, chase efficient volume.

This is the volume you can measure with the old rusty-dusty statistics such as receiving targets, receptions, yards, and TDs, all on a per-game basis. Improve your predictive ability by utilizing the most predictive efficiency metrics that also have the lowest possible inherent variation. Metrics like first down receptions per game or, the cream of this crop, first down receptions per touchdown ratio measured on a per-game basis, a metric you will likely have to calculate yourself.

Additionally, while it’s certainly true that you can score a lot of fantasy points without a high aDOT, it is much more difficult to find players with a high aDOT and receive who also have a high target volume (four targets per game or greater, for instance), who also play a half-season or more, who produce fantasy scoring at a below-average rate. (Course, the Jets certainly made it happen last season.)

However, if you’re getting that granular with your fantasy analysis, well quite frankly, kudus to you. We are clearly kindred spirits. And also, those rusty-dusty efficiency stats are still more correlated to fantasy scoring. #shrugemoji

Just know that you can get to the exact same place: a reliable and reasonably accurate correlative measure of fantasy production simply by analyzing players by traditional efficiency metrics. The key is to analyze your counting stats on a per-game basis. This alone weeds out a great deal of the noise that games-missed can cause in season-long databases. Also, understand that variation happens game to game, extrapolating from predictive stats alone ignores all context, which is something you should never ever do (at least not when there’s money or pride on the line).

Some areas for additional research include the separation of these receiving statistics by position, as all receiving stats from 2009-2020 are included in this study, including that of running backs and tight ends. Additionally, in an effort to improve our understanding of the degree of variation for each of these metrics, I think it would be very informative to investigate the coefficient of variation for these metrics on a game-by-game basis rather than a year-over-year study such as this. That, however, will have to wait for another time.

Twitter: @SubstanceD3

104 views0 comments