Greg Wrubell usually does good work over at his Cougar Tracks blog on KSL.com, but in his latest post, Wrubell steps into some murky statistical waters. The result isn’t pretty:
“A stat I really like is ‘yards per point,’ generally a measure of offensive efficiency.” [author's emphasis]
The italicized generally helps a little bit, but this statement is still generally misleading. Yards per play (YPP) is one of those statistics championed by very smart people (in this case, Phil Steele), but misunderstood by almost everyone else. Any stat can be incorrectly applied in a number of ways. Wrubell’s guilty of a few of them.
1. Misunderstanding what a statistic measures
What is it that YPP actually measures? Well, to state the obvious, it is the total yardage gained by an offense (or allowed by a defense) divided by the points scored (or points allowed). Does a low YPP indicate an efficient offense, as Wrubell suggests? Not necessarily–it could indicate a number of things. Perhaps the team employs excellent special teams, ensuring better field position, and less yardage necessary for a score. It’s possible that the team excels in the red-zone, scoring more touchdowns and kicking less field goals, with similar yardage gained. However, YPP can also indicate a team’s “luck factor,” or in other words, the effect of random statistical variance on said team.
Let’s suggest that a certain runningback will average one fumble lost for every sixty touches. Assuming his touches are distributed evenly across the field (i.e. He isn’t a goal-line back.), one would expect 20% of his fumbles to occur in the red zone, because twenty yards out of 100 = 20%. Now, we all know that statistical models don’t play out so neatly–we’re dealing with probabilities rather than realities. (If you flip a coin five times, you can’t get heads two and a half times, and it’s not unlikely that you might get tails four or five times in a row.) Let’s say this runningback gets 600 touches in a season and fumbles the ball 10 times, as expected. However, in this case, none of his fumbles occurred in the red zone, instead coming earlier in drives before his team had been able to rack up many yards. The fumbles still result in ten scoreless drives, but involve less yardage gained, leading to a lower YPP for the team. Is this lower YPP the result of greater offensive efficiency? Not at all. In this case, it’s at least partially a measure of where the runningback fumbles the ball, which is a result of random statistical variance.
This is actually pretty intuitive if you think about it a little differently. If you had to use one statistic to guess which team scored the most points in a season, wouldn’t it be total yards gained? Of course it would, because for the most part, yards gained and points scored are directly proportional. What YPP measures is how much a team varies from this relationship of direct proportion, which, as we’ve already discussed, is less a measure of “offensive efficiency” than it is a measure of luck.
2. Using a statistic incorrectly
So if YPP is a measure of luck, isn’t it a worthless statistic? Well, that depends on what you want a statistic to do. If you want it to accurately measure the quality of past performances, YPP is mostly useless. In essence, it’s taking two useful statistics and combining them in a way to remove their relevancy. If you want a statistic to predict the quality of future performance, it’s also pretty worthless, because YPP isn’t a qualitative measure.
If YPP were a measure of quality, you’d think there’d be a pretty big difference between the YPPs for the best and worst teams. Here are the YPP calculations for four teams last year–one elite, one good, one average, and one poor:
USC – 5,911 yards / 488 points = 12.1 YPP
Georgia – 5,538 yards / 409 points = 13.5 YPP
UNC – 4,178 yards / 360 points = 11.6 YPP
Western Kentucky – 2,690 yards / 210 points = 12.8 YPP
In terms of offensive efficiency, these four teams seem pretty clearly sorted in descending order, using yards gained. However, if we sort them according to YPP, you get North Carolina ranked higher than USC and transitional FBS Western Kentucky higher than #13 Georgia.
So if YPP is not an indicator of quality, how is it useful? To explain, let me digress into baseball.
In baseball, there’s a statistic called Batting Average on Balls in Play (BABIP), which measures how often a ball hit into play will fall for a hit. The fundamental premise is that neither a hitter nor pitcher has much control over where a ball ends up once it’s hit. Whether or not the ball is hit to a place where a defender can make a play is a function of random variance, centering somewhere around a 27-33% likelihood of it being an out. BABIP is actually more successfully used in terms of pitchers, rather than hitters, as hitters can do a few small things to influence the path of the ball, while pitchers face a wide enough variety of hitters to make the total BABIP of the batters they face almost completely random.
So why keep track of a stat that is essentially random? Because in any random sample, there are outliers–data points on the extreme sides of the expected result. In this case, those outliers represent unsustainable performances. So, if a pitcher allows a .375 BABIP, he’s likely to have given up more hits and runs than one would have expected. The good news is that his opponents’ BABIP is unsustainable, and therefore, extremely unlikely to remain that high. This means that his perceived performance is very likely to improve going forward, without any change in the pitcher’s actual ability or the quality of his pitches. So, it obviously helps to keep track of BABIP, and other similar indicators, to know how a pitcher’s performance might change in the future.
YPP is a similar statistic. Because it involves a lot of random variation, at least in terms of offensive performance (i.e. where a back fumbles the ball, the field position opponents offer, etc.), outlying YPP numbers can be viewed as mostly unsustainable.
In Wrubell’s post, he lists the YPP numbers for each MWC team so far this season, intending to make a comment about each team’s efficiency so far. To his credit, he does acknowledge that TCU’s giving up “cheap points” has heavily influenced their defensive YPP. He should have gone a step further and acknowleged that similar factors have influenced all team’s YPP numbers. Instead, he suggests that Utah’s the Vermillion Nasty’s relatively high offensive YPP means they “aren’t as good at turning yards into points,” using their turnovers and missed field goals as evidence that their high YPP indicates an inefficient offense. There are two problems here. First of all, why can’t we just look at turnovers and field goal percentages as evidence of something in themselves? If that’s what YPP reflects, why do we need YPP? Secondly, Wrubell isn’t acknowledging the luck factor in those events. Turnovers are an indicator of offensive efficiency, but where on the field those turnovers occur affects YPP almost as much as how often, and that is essentially random.
If you want to use YPP correctly, look at it as variance that is likely to regress to the mean. The teams with the lowest YPPs (Air Force, TCU, and BYU) are likely to look at least slightly less explosive going forward, while the teams with the highest YPPs (New Mexico, Wyoming, and you know who) are not as inept as they’ve seemed so far.
3. Failure to consider sample size
College football is a tough game to predict, because there is so much turnover on rosters, and the combination of systems and personnel often leads to unpredictable results. It takes several weeks to really get an idea of which teams are strong and weak and in which ways. Similarly, it takes at least that long for any statistic to have a useful context. Because YPP is a measure of variance, having an adequate sample size is even more important to having useful results. Even if you’re trying to use YPP correctly, to use it with two data points per team is pretty much useless.
Again, I want to stress that I like Wrubell and his blog. I just really key in on these statistical discussions and when one is misleading, the pedant in me flares up. Maybe I should create a new stat: WPP (words per point made). I’m afraid this post (and most of mine) wouldn’t rank very well by this statistic.
A preview of Saturday’s game is coming up soon, so stay tuned. I’ll try and keep it brief.