Evaluating the 2018 Projection Systems

As we all hunker down with our spreadsheets in preparation for draft season, it’s time for my favorite yearly tradition - evaluating the projection systems! Forgive the somewhat abbreviated post in comparison to past years’ analyses, but trust that all the same meticulous research is here as always. The goals, as always, are to evaluate the best projection systems on a per-category basis for the purposes of fantasy baseball, and to determine the best possible mix of projections (separated into rate stats and playing time) as we look ahead to ‘19.

In this study, I'll focus on the most commonly used projections - the same ones that appear in the Big Board: Steamer, PECOTA, ZiPS, ATC, The Bat, Fangraphs Depth Charts, and Fangraphs Fans. The categories of interest are the typical 5x5 categories, HR/R/RBI/SB/AVG for hitters, and W/SO/SV/ERA/WHIP for pitchers. Since for fantasy purposes we only care about the relative projections made by each system (ie, we only need to know Trout is the best hitter in baseball, not exactly what his AVG will be), I'll primarily use R squared to evaluate how well the projections correlated to actual results, but I'll also be including RMSE to show the absolute error in each projection system. The most common fantasy leagues draft about 300 players, broken out into 180 hitters, 90 SP, and 30 RP, and so I've used the consensus top 300 players as determined by an average of the projection systems, and will only be evaluating the systems based on their projections of those 300 players. One final adjustment - hitters that didn't end up reaching 400 PA and pitchers that didn't reach 35 IP have been thrown out of the sample to reduce the effect of short playing-time outliers (typically from injury).


First, a definition - the Big Board mix for hitters this year, which combines the systems to produce the best overall results, is:

  • Playing Time: 41% Fans, 32% ZiPS, 27% Steamer

  • Rate Stats: 58% ATC, 42% Steamer

Note that rate stats like AVG were also evaluated as part of the 'total' projections by using a playing-time weighted value indicated by an 'n' (e.g. "nAVG"). The SB (*) projections were evaluated on the basis of two separate populations and averaged: players who stole >5 bases, and those who stole 5 or less. Past analysis has shown that evaluating this as a single population gives undue credit for projecting the low-steal players, and not enough credit for accurately projecting high-steal players.


Starting with playing time, nearly every projection system struggles here every year. Between injuries, lineup spots, and role changes, playing time is just plain difficult to peg. Many people swear by the hand-curated playing time over at Fangraphs, but here we see it failed to live up to that hype for the 3rd straight year. By combining Steamer, ZiPS and the Fans’ opinions of playing time, we get the best of each system, so the Big Board mix comes up with a significantly better overall projection of PA's.

The rate stat performances were more similar than different, although you may note some poor performances in R and RBI for a few systems. The systems which average projections (ATC, FGDepth, Big Board mix) come out on top, as might be expected. The net result is calculated in terms of the percentage above (or below) average across the five categories for each system, and is also plotted below.


In the case of hitters, the Big Board mix beats all the others noticeably, beating the average five-category R sq. by about 13%. On a per-PA basis, ATC comes close, and FGDepth is not far behind that, but the improved playing time gives my mix the big advantage. I said in this piece last year that I hoped to see improved results from PECOTA, but they only got worse in ‘18. I’m holding out hope that their new DRC+ improvements will help in 2019! 

RMSE: This gives you an idea of what the typical error in each category, for each system. Each counting stat is listed as error per 600 PA to normalize the values to approx. full season scale.



Another definition - the Big Board mix for pitchers this year, which combines the systems to produce the best overall results, is:

  • Playing Time: 72% ATC, 28% Steamer

  • Rate Stats: 32% ATC, 30% Steamer, 27% PECOTA, 11% The Bat

As with the hitter projections, weighted rate stats will be indicated by an 'n' (e.g. "nERA"). The IP, W, and SO (*) projections were evaluated on the basis of two separate populations and averaged: starters and relievers. Past analysis has shown that evaluating this as a single population gives undue credit for projecting the separation between these two populations. The SV (**) projections were evaluated for relievers only.


The new method for IP evaluation produces, in my opinion, a much more accurate result - pitcher playing time projection is hard. ATC rises above the rest though, so kudos to Ariel. The Big Board mix is made marginally better by incorporating a bit of Steamer.

ZiPS managed to do something amazing this year - the W/IP projection literally had a 0.00 correlation (R squared) to actual 2018 results. I don’t think I’ve ever seen that. Otherwise, we again see that many systems produced similar results, with a few poor performances in individual categories here or there. ATC and The Bat fared quite well overall, with Steamer just a bit behind them. Still, the Big Board mix takes the top spot from all of them by performing reasonably well in every category. ZiPS had a rough year on the pitching side - might the FG site projections improve by incorporating ATC into their depth chart projections instead of ZiPS..? Again, the net results are plotted below:


One thing that sticks out here is that The Bat might have been one of the best if they’d had a better playing time projection! But beyond that, we also see just how impressive a year ATC had.

RMSE: The root mean square error for each pitching category... in this case, normalized to 200IP (or 65, for SVs).