The player value series is back again! I run this series of posts every year now. They are a mix of repeats from previous years, which are intended as a resource to help broaden the base of people that know how to run player values, and newly updated entries which provide info specific to 2018. Stay tuned throughout the preseason for the rest of the series! - HWB
Entering now into part six of my preseason player valuation series, we arrive at one of the more important decisions of the preseason: deciding which projection system(s) to use. As a testament to how important this is, people have been asking me about this piece for weeks - wait no longer!
Evaluating projection systems is well-trodden ground, although this year I have seen surprisingly few (ie, zero) pieces written about the '17 projection system results. As a refresher, last year I found Steamer to generally be the best system, along with an ideal mix of Steamer, PECOTA, and ZiPS that achieved fairly impressive accuracy.
In this study, I'll focus on the most commonly used projections - the same ones that appear in the Big Board: Steamer, PECOTA, ZiPS, ATC, Fangraphs Depth Charts, Fangraphs Fans, and Clay Davenport projections. ATC is the newcomer this year, after their first appearance on FanGraphs in '17. The categories of interest are the typical 5x5 categories, HR/R/RBI/SB/AVG for hitters, and W/SO/SV/ERA/WHIP for pitchers. I'll look at each system's ability to project these categories in total as well as on a per-PA/per-IP basis. Since for fantasy purposes we only care about the relative projections made by each system (ie, we only need to know Kershaw is the best SP in baseball, not exactly what his ERA will be), I'll primarily use R squared to evaluate how well the projections correlated to actual results, but I'll also be including RMSE to show the absolute error in each projection system. The most common fantasy leagues draft about 300 players, broken out into 180 hitters, 90 SP, and 30 RP, and so I've gone through each system to find the consensus top 300 players as projected in the 2016 preseason, and will only be evaluating the systems based on their projections of those 300 players. One final adjustment - hitters that didn't end up reaching 400 PA and pitchers that didn't reach 35 IP, have been thrown out of the sample to reduce the effect of outliers.
Finally - for both hitters and pitchers, I'll present the the weighting factors used to create the best possible projection from a combination of the various 2017 projections. This year, it's called "StATz" for hitters (a combination of Steamer, ATC, and ZiPS) and "Steacotaps" for pitchers (a combination of Steamer, PECOTA, and ZiPS). I'm also including ideal weighted combinations of playing time projections - a first, since the Big Board now supports mixing of different "Combo"s, one for playing time, and another for your rate stats.This combined projection will be the default one I set for the Big Board as we head into draft season.
First, a definition - 'StATz': 45% Steamer, 30% ATC, 25% ZiPS. This custom mix also uses weighted playing time, 25% FG Depth Charts, 30% ATC, 45% ZiPS. Note that rate stats like AVG were also evaluated as part of the 'total' projections by using a playing-time weighted value indicated by an 'n' (e.g. "nAVG").
Starting with playing time, nearly every projection system struggles here every year. Between injuries, lineup spots, and role changes, playing time is just plain difficult to peg. Many people swear by the hand-curated playing time over at Fangraphs, but here we see it failed to live up to that hype for the 2nd straight year. For the second straight year, ZiPS did well for hitter playing time! I have always heard that ZiPS makes no effort to project accurate playing time, but at this point I think we have to accept that the ZiPS approach is pretty good. ATC also did quite well projecting playing time! By combining ZiPS, ATC, and FGDepth opinions of playing time, we get the best of each system, so StATz comes up with a significantly better projection of PA's.
Homers and steals were again the easiest offensive categories to project, with no system getting much of an edge on a per-PA basis. PECOTA and Clay did poorly here, and I suspect that these two systems have been slower to adjust to our new juiced-ball-reality. Steals are obviously assisted by the low/no-speed guys who are accurately projected for nearly no steals. Steamer recovered from a bad year projecting SB last year. And hey, the fans showed up and projected HR/PA and SB/PA very well - color me surprised! Runs were again very hard to project, but there was also quite a spread from FGDepth at the top to PECOTA/Clay at the bottom. RBI were projected relatively well by all systems, except for Clay. Average is understandably difficult to project given the year-to-year fluctuation in that category, and all systems besides PECOTA had R sq. between .36-.42. The net result is calculated in terms of the percentage above (or below) average across the five categories for each system, and is also plotted below.
It's good to see that some of the common fantasy baseball adages are true - averaging projections together gives you better results. In the case of hitters, my combined StATz projection beats all the others noticeably, beating the average five-category R sq. by about 9%. On a per-PA basis, FGDepth comes close, and ATC/Steamer are not far behind that, but the improved playing time gives StATz the big advantage. I said in this piece last year that I was losing confidence in ZiPS after a rough '16 (9% below average last year). Consider me corrected, as ZiPS bounced back in a big way. Meanwhile, ATC made a nice debut on the hitting side, performing about as well as Steamer. PECOTA had a rough year (3% above average in '16, 6% below average in '17), leading me to drop them out of the Big Board combo mix in favor of ATC. I'm hoping to see improvements out of them next year..!
RMSE: This gives you an idea of what the typical error in each category, for each system. Each counting stat is listed as error per 600 PA to normalize the values to approx. full season scale.
As it turns out, RMSE shows that the systems are perhaps a bit closer to each other than R sq. would have you believe, but the conclusions are largely the same - StATz is the best or second-best in nearly every category. RMSE highlights PECOTA's poor performance in HR's, as well as ATC's performance in SB. Again, I'm drawing my primary conclusions from the R sq. data above, but this is nevertheless another interesting perspective on the data.
Verdict: StATz wins out, a 45-30-25 combo of Steamer-ATC-ZiPS with a 45-30-25 combo of ZiPS-ATC-FGDepth for playing time. But if you're in a hurry, the FGDepth projections with ATC playing time will be almost as good.
Another definition - 'Steacotaps': 50% Steamer, 30% ZiPS, 20% PECOTA. This custom mix also uses weighted playing time, 60% Steamer and 40% FG Fans. As with the hitter projections, weighted rate stats will be indicated by an 'n' (e.g. "nERA").
The Fans continue to be a reliable source for projecting IP. Last year, Fans and PECOTA were very obvious winners of IP projection with Rsq values approaching 0.70, but this time around nobody did quite as well, and PECOTA actually dropped towards the back of the pack. Here we can see better evidence of ZiPS reputation as a poor source for playing time - apparently it's just pitchers where it struggles. A combination of the two best IP projections, Steamer and Fans, yields a slightly improved result for our weighted combination, Steacotaps.
Steamer had previously been the champion of pitcher projections, but after big improvements from ZiPS, the FGDepth projection (combined ZiPS/Steamer) rose to the top in '17. Still, Steacotaps takes the top spot from all of them by improving its predictions marginally in each category (except for W, which are a total crapshoot). ATC's debut on the pitching side was less impressive, but they did just fine. PECOTA's struggles continued here, I have to assume it's related to the bad HR projections seen on the hitting side as well, but they still have the best SO projections around, and that goes quite a ways towards making good pitcher projections. Again, the net results are plotted below:
FGDepth and Steamer performed about equally well, with ZiPS only being held back by bad IP projections. ATC sits in the middle. Clay and PECOTA managed to beat the Fans, but not by much. When the Fans are close to beating your system, for both hitters and pitchers, maybe it's time to re-evaluate. Get it together, guys!
RMSE: The root mean square error for each pitching category... in this case, normalized to 200IP.
Based on RMSE, the differences between the systems are not that vast, but you can also see that the error in ERA is huge. In your fantasy leagues, you should pay for strikeouts and WHIP, not for ERA. Wins are also a total shot in the dark, although they look less bad when combined with the IP projection (the number one predictor of Wins is IP and IP-per-game). All in all, FGDepth is quite good. But combining the systems in a more curated way in Steacotaps is better.
Verdict: Steacotaps kicks some butt, using a 50-30-20 combo of Steamer-ZiPS-PECOTA, with a 60-40 split of Steamer-Fans for playing time. Those in a hurry can just use straight FGDepth or Steamer.