Can 1 Month of Statcast Data Be Used to Evaluate Hitters?

You’ve all done this. Or at least, if you know about Statcast, ‘xStats’, and Baseball Savant, you have – pull up the xStats list, sort by under- or over-performers, and use it to draw broad and sweeping conclusions about your fantasy teams. Which of your fantasy players are poised for quick resurgence, or which of your opponents’ players are prime trade targets? Which guys should you be selling high on, before the bottom drops out? But, in the same way that you can’t really sort the FanGraphs leaderboards by ERA minus FIP and just magically find pitching diamonds in the rough (homer rates complicate things…), this is maybe not the best way to be applying our vast wealth of fancy Statcast-based metrics. I’ve found early-season Statcast data difficult to trust personally, so I decided to dive in and see what exactly we can learn from 1 month of xStats. It turns out there may be something useful here – the method I arrived at after this work would have advised you to buy-in on Jose Ramirez after his rough start to 2019! But we’ll get to that.

I will dive into gritty details below, but first to quickly outline, here are the major questions I’m setting out to answer:

Do 1-month xStat under-/over-performers play better/worse rest-of-season? (Yes!)
Do 1-month xStats have any more predictive power than Steamer projections? (No…)
Do 1-month xStats help to evaluate players Steamer projected inaccurately? (Not really)

The ultimate takeaway is that it seems like players that underperform their xStats in the first month are good targets to pickup or trade for.

To collect the data for this study, I made use of Alex Chamberlain’s version of xStats, available here. Alex’s version of these leaderboards focuses on xwOBAcon, which is xwOBA on contact, or in other words, it ignores K/BB rates. This is fine by me for the purpose of this study. I am trying to understand the batted-ball changes that players are showing, and at this point in the season, changes in K%/BB% are both prone to noise, and maybe more easily analyzed via ‘SABR 1.0’ approaches anyway. I pulled the set of players that posted at least 300 PA in 2019, and at least 75 PA before May 1st of that year, a total of 208 hitters.

As a brief aside, I made one more tweak to the data used in this work. One common complaint that you’ll see about xStats is that they don’t account for everything, and therefore can indicate xwOBA – wOBA differences that are inherent to the player. For example, foot speed, even when accounted for, appears to be a factor. Some of the patron saints of “look, he’s about to break out based on xStats” are guys with SPD scores below 2.0: Kendrys Morales, Justin Smoak, Miguel Cabrera. Conversely, check the other end of the list and you’ll find Mallex Smith, Victor Robles, and Adalberto Mondesi are always able to outrun their xStats. Lacking the time to develop my own framework for xStats to try to fully address these issues, the tweak I made is simple: Alex’s data goes back to 2017, so I downloaded the full dataset and used the career xwOBAcon – wOBAcon differences for each player as a ‘correction factor’ for their xwOBAcon. All xwOBAcon’s referenced from here on will actually be ‘corrected’ xwOBAcon’s.

#1: One-Month xStats vs. Rest-of-Season

37 players underperformed their xwOBAcon by at least .050 in April of ‘19 (for reference, the average wOBAcon for this group of players is about .390). Of that 37 players, 30 of them performed better (+.066 wOBAcon on average) rest-of-season, indicating that this group of players mostly overcame their bad-luck Aprils. In fact, on average, these 37 players still slightly beat their Steamer-projected wOBAcon’s for the season!

On the other side of the spectrum, 19 players overperformed their xwOBAcon by at least .050, and 18 out of those 19 performed worse (-.081 wOBAcon on average) rest-of-season! It’s not that these players did poorly either – on average they beat their projections, including some big breakouts like Omar Narvaez and Austin Meadows. But xStats correctly identified that these players were way hot in April and getting a little lucky.

This is a good baseline to work from – as we might hope, xStats is able to identify players that are getting unfair results from their batted balls during the first month of the season, and that behavior tends to regress over the rest of the season. But, the nuance is that ‘over/under-performing xStats’ does not actually equate to players being good or bad.

#2: One-Month xStats vs. Steamer

Here’s the bummer for xStats: even if you just looked at the preseason Steamer projections, they would still tell you more about rest-of-season performance than one month of xStats data would. From May onward, the preseason Steamer projections averaged .037 points of wOBAcon error across all players. Meanwhile, if you pro-rated the April wOBAcon’s forward, you’d get .063 points of error, and if you did the same with the April xwOBAcon’s, you’d get .053 points of error. So xwOBAcon is closer than just pro-rating the raw April stats, but still far inferior to an actual projection system, as one might expect.

If you isolate just the xStat underperformers, the results are largely the same, however something interesting jumps out of xStat overperformers. For that particular group of players, pro-rating their xwOBAcon’s led to just .041 points of error, not for from Steamer’s .036 points. It’s not entirely clear to me why – perhaps being an overperformer just increases the likelihood of having stats which regress toward your xStats? This would be something driven simply by sample size, as opposed to underperformers, where the reasons might be more complicated and hard to overcome, like loss of footspeed or increases in defensive shifting.

One could reasonably assume from these findings that the next best option would be to have a projection system which incorporates expected outcomes. Maybe pro-rating 1 month of stats is not a good idea (it’s not), but incorporating that new info into an existing weighted average of past performances could yield some useful tweaks! I’m never clear on which in-season data are used to create the ‘rest-of-season’ projections for the publicly available systems, but my best guess here would be that Derek Carty’s “The Bat X” is already doing this. I’m very curious to see how that new system performs this year.

#3: One-Month xStats vs. Inaccurately-Projected Players

The final thing we can do is to turn this around and look at the groups of players that the projection system ‘missed’ in 2019. It turns out that the big breakouts and busts actually show some pretty clear signs in April of how the rest of their season is going to go, as many of the Steamer overperformers in April are the Steamer overperformers RoS (and same for underperformers). However, there are also a ton of players that have hot/cold Aprils and end up finishing out the season either as-projected or in the opposite cold/hot direction. I’ve sliced and diced this every which way, and it’s pretty much noise. The hot April players for xStats are pretty much the hot April players for regular stats, and it can be difficult to tease out who is having a ‘breakout’ vs. who was hot for a month, without diving into deeper analysis.

Buys/Sells Based on April 2019 xStats

Even though your best bet is to rely on the projections based on the above analysis, it could still be worth pairing that with xStats to understand which players are having legitimate breakouts or busts vs. which players are running into some large wOBA vs. xwOBA disparities. You can find the full list of players used for this study in the spreadsheet here if you want to follow along at home. First, the underperformers, where we want to find players with very bad current performance (wOBAcon) but high expected performance. Limiting our list down to the 20 largest April – xApril differences, combined with the worst wOBAcon values, here are the top three guys you might have targeted in trades:

Buy #1) Jose Ramirez (.226 April wOBAcon, .344 April xwOBAcon, .385 RoS wOBAcon)
Would have been a great buy. The RoS wOBAcon is a little above the initial Steamer projection. He went .276/.340/.536 the rest of the way, with 21 HR and 15 SB.

Buy #2) Marwin Gonzalez (.242 April wOBAcon, .343 April xwOBAcon, .393 RoS wOBAcon)
Another pretty good one, I seem to recall him being on waiver wires after hitting .167 thru the first month. Again the RoS wOBAcon is a little above the initial Steamer projection. Good for .285/.340/.450 from May onward with 13 HR.

Buy #3) Jesus Aguilar (.254 April wOBAcon, .350 April xwOBAcon, .368 RoS wOBAcon)
Steamer had him down for a .409 wOBAcon, so this was a rough one if you bought in on Aguilar. Sure, at least he rebounded to something much better than the dreadful April, but the 9 HR and .261 AVG from May onward just did not get it done for you.

Honorable ‘Buy’ Mentions: Jurickson Profar, Jackie Bradley Jr., Niko Goodrum, Ramon Laureano
Dishonorable ‘Buy’ Mentions: Yonder Alonso, Ryan O’Hearn

For the overperformers, this is all about the guys you would have tried to sell-high on. Ideally, we wouldn’t sell high on players with high xwOBAcon’s, so this is more about guys that have low xwOBAcon’s but fraudulently high wOBAcon. Otherwise we would ‘sell’ Austin Meadows because his .499 April xwOBAcon was 60 points lower than his actual performance (yikes!). Limiting our list down to the 20 largest April – xApril differences, combined with the lowest xwOBAcon values, here are the top three guys you might have sold in trades:

Sell #1) Jarrod Dyson (.408 April wOBAcon, .322 April xwOBAcon, .269 RoS wOBAcon)
By the end of April he had hit over .300 with 3 HR and 3 SB, and you might have been daring to dream on a 30+ SB season paired with suddenly improved AVG and HR’s from Dyson. He did end up hitting those 30 SB, but everything else was pretty putrid. If someone was buying, it was a good sell, assuming you had other SB options.

Sell #2) Robinson Chirinos (.435 April wOBAcon, .337 April xwOBAcon, .407 RoS wOBAcon)
This would have been a bad sell. By the end of the season, his xStats and regular stats actually converged, and he wOBA’d .047 points higher than Steamer projected when all was said and done. 17 HR from your C2 is nothing to sneeze at.

Sell #3) Nick Ahmed (.347 April wOBAcon, .298 April xwOBAcon, .348 RoS wOBAcon)
Another bad sell, he was actually very consistent between April and RoS at about .020 points higher than what Steamer originally projected. He was probably only owned in deeper leagues, but finished pretty decently with 19 HR / 8 SB.

Honorable ‘Sell’ Mentions: Dan Vogelbach, Rhys Hoskins, Joey Votto
Dishonorable ‘Sell’ Mention: Jorge Soler, Max Muncy, Wil Myers

All in all, the ‘buy’ recommendations look a lot more credible than the ‘sells’. I might guess that this is because MLB teams are using these data intelligently, too. Players that perform poorly, both in real stats and in xStats, are not given as much rope as they might have been given in the pre-data-driven era of baseball.

So how about 2020?

We’ve landed on the conclusion that we should target players that are the biggest xStat underperformers through the first month, primarily the ones with bad current wOBAcon values. I’ve limited to only players that have at least 55 PA, but bump that up to whatever level is reasonable to you, for whatever point in the season you’re at. Again, the 2020 tab in the spreadsheet includes the list of players used to generate this.

Target #1) Khris Davis (.228 wOBAcon, .361 xwOBAcon)
As long as he still has a job (stay back, Robbie Grossman, everybody knows you’re overperforming!). It’s hard to believe the power is just gone, and he has been on waiver wires in some of my leagues.

Target #2) Justin Upton (.181 wOBAcon, .298 xwOBAcon)
The xwOBAcon is below .300 and he’s fighting for playing time, too. This one might actually be a pass for me, but throw him on your watch list just in case. He’s not this bad.

Target #3) Eduardo Escobar (.267 wOBAcon, .381 xwOBAcon)
Another guy I have seen on some waiver wires, the window might be closing as he’s hit a couple homers this week. His xwOBAcon is essentially the same as what he put up in 2019. Buy!

Target #4) Pablo Sandoval (.250 wOBAcon, .362 xwOBAcon)
He’s probably free in every league out there, right? He’s totally off my radar, but he’s DH’ing. Did anyone else realize he was actually pretty decent last year (.239 ISO!)? I’d put him on the watch list at least, but in deep leagues this could be a life raft for teams facing injuries and COVID, as he’s DH’ing against RHP for the surprisingly decent Giants offense.

Target #5) Luis Arraez (.292 wOBAcon, .390 xwOBAcon)
I was hoping he would get close to hitting .400 in this short season, but he was in the low .200’s after 15 games and is on the wire in quite a few leagues. He’s actually up to .270ish now but the wOBAcon is lacking because of the zero power. But xStats say he’s been even better than last year so far! He’s got a balky knee right now, so buyer beware, but if he gets back into the lineup, this is a nice target.

Honorable Mentions: Tommy Pham (IL’d), Elvis Andrus (IL’d), Eugenio Suarez, Carlos Santana, Cody Bellinger, Jeff McNeil, Domingo Santana

#1: One-Month xStats vs. Rest-of-Season

#2: One-Month xStats vs. Steamer

#3: One-Month xStats vs. Inaccurately-Projected Players

Buys/Sells Based on April 2019 xStats

So how about 2020?

Leave a Comment Cancel Reply