Archive for July, 2018

I created a bit of controversy on Twitter a few days ago (imagine that) when I tweeted my top 10 to-date 2018 projections for the total value of position players, including batting, base running, and defense, including positional adjustments. Four of my top 10 were catchers, Posey, Flowers (WTF?), Grandal, and Barnes. How can that be? Framing, my son, framing. All of those catchers in addition to being good hitters, are excellent framers, according to Baseball Prospectus catcher framing numbers. I use their season numbers to craft a framing projection for each catcher, using a basic Marcel methodology – 4 years’ weighted and regressed toward a population mean, zero in this case.

When doing this, the spread of purported framing talent is quite large. Among the 30 catchers going into 2018 with the most playing time (minors and majors), the standard deviation of talent (my projection) is 7.6 runs. That’s a lot. Among the leaders in projected runs per 130 games are Barnes at +18 runs, and Grandal and Flowers at +21. Some of poor framers include such luminaries as Anthony Recker, Ramon Cabrera, and Tomas Telis (who are these guys?) at -18, -15, and -18, respectively. Most of your everyday catchers these days are decent (or a little on the bad side, like Kurt Suzuki) or very good framers. Gone are the days when Ryan Doumit (terrible framer) was a full-timer and Jose Molina (great framer) a backup.

Anyway, the beef on twitter was that surely framing can’t be worth so much that 4 of the top 10 all-around players in baseball are catchers. To be honest, that makes little sense to me either. If that were true, then catchers are underrepresented in baseball. In other words, there must be catchers in the minor leagues who should be in the majors, presumably because they are good framers though not necessarily good hitters or in other arenas like throwing, blocking pitches, and calling games. If this beef is valid, then either my projection methodology for framing is too strong, i.e., not enough regression, or BP’s numbers lack some integrity.

As a good sabermetricians should be wont to do, I set out to find out the truth. Or at least find evidence supporting the truth. Here’s what I did:

I did a WOWY (without and with you – invented by the illustrious Tom Tango) to compare every catcher’s walk and strikeout rate with each pitcher they worked with to that of the the same pitchers working with other catchers – the without. I did not adjust for the framing value of the other catchers. Presumably for a good framing catcher they should be slightly bad, framing-wise, and vice versa for bad-framing catchers, so that there will be a slight double counting. I did this for each projected season 2014-2017, or 4 seasons.

I split the projected catchers into 3 groups, Group I were projected at greater than 10 runs per 150 games (8.67 per 130), Group II at less than -10 runs, and Group III, all the rest. Here is the data for 2014-2017 combined. Remember I am using, for example, 2017 pre-season projections, and then comparing that to a WOWY for that same year.

Total PA Mean Proj per 130 g W/ BB rate WO/ BB rate Diff W/ SO rate WO/SO rate Diff
74,221 -12.6 .082 .077 .005 .197 .206 -..009
107,535 +13.3 .073 .078 -.005 .215 .212 .003
227,842 -.2 .078 .078 0 .213 .212 .001

 

We can clearly see that we’re on the right track. The catchers projected to be bad framers had more BB and fewer SO than average and the good framers had more SO and fewer BB. That shouldn’t be surprising. The question is how accurate are our projections in terms of runs. To answer that, we need to convert those BB and SO rates into runs. There are around 38 PA per game, so for 130 games, we have 4,940 PA. Let’s turn those rate differences into runs per 130 games by multiplying them by 4,940 and then by .57 runs which is the value of a walk plus an out, which assumes that every other component stays the same, other than outs. My presumption is that an out is turned into a walk or a walk is turned into an out. A walk as compared to a neutral PA is worth around .31 runs and an out around .26 runs.

Total PA Mean Proj per 130 g W/ BB rate WO/ BB rate Diff in runs/130 W/ SO rate WO/SO rate Diff
74,221 -12.6 .082 .077 +14.0 .197 .206 -.009
107,535 +13.3 .073 .078 -14.0 .215 .212 .003
227,842 -.2 .078 .078 0 .213 .212 .001

 

Let’s make sure that my presumption is correct before we get tool excited with those numbers. Namely that an out really is turning into a walk and vice versa due to framing. Changes in strikeout rate are mostly irrelevant in terms of translating into runs, assuming that the only other changes are in outs and walks (strikeouts are worth about the same as a batted ball out).

Total PA Mean Proj W/ HR WO/HR Diff W/ Hits WO/Hits Diff W/ Outs WO/

Outs

Diff
74,221 -12.6 .028 .028 0 .204 .203 .001 .675 .681 -.006
107,535 +13.3 .029 .029 0 .200 .198 .002 .689 .685 .004
227,842 -.2 .029 .029 0 .199 .200 -.001 .685 .683 .002

 

So, HR is not affected at all. Interestingly, both good and bad framers give up slightly more non-HR hits. This is likely just noise. As I presumed, the bad framers are not only allowing more walks and fewer strikeouts, but they’re also allowing fewer outs. The good framers are producing more outs. So this does in fact suggest that the walks are being converted into outs, strikeouts and/or batted ball outs and vice versa.

If we chalk up the difference in hits between the with and the without to noise (if you want to include that, that’s fine – both the good and bad framers lose a little, the good framers losing more), we’re left with outs and walks. Let’s translate each one into runs separately using .31 runs for the walks and .26 runs for the outs. Those are the run values compared to a neutral PA.

Total PA Mean Proj per 130 g W/ BB rate WO/ BB rate Diff in runs/130 W/ Outs WO/

Outs

Diff
74,221 -12.6 .082 .077 +7.7 .675 .681 +7.7
107,535 +13.3 .073 .078 -7.7 .689 .685 -5.1
227,842 -.2 .078 .078 0 .685 .683 -2.6

 

So our bad framers are allowing 15.4 runs more per 130 games than the average catcher or than their others at least, in terms of fewer outs and more BB. The good framers are allowing 12.8 fewer runs per 130 games. Compare that to our projections, and I think we’re in the same ballpark.

It appears from this data that we have pretty strong evidence that framing is worth a lot and our four catchers should be in the top 10 players in all of baseball.

Advertisement

Let’s face it. Most of you just can’t process the notion that a pitcher who’s had 10 or 15 starts at mid-season can have an ERA of 5+ and still be expected to pitch well for the remainder of the season. Maybe, if they’re a Kershaw or Verlander or a known ace, but not some run of the mill hurler. Similarly, if a previously unheralded and perhaps terrible starter were to be sporting a 2.50 ERA in July after 12 solid starts, the notion that he’s still a bad pitcher, although not quite as bad as we previously estimated, is antithetical to one of the strongest biases that human beings have when it comes to sports, gambling, and in fact, many other aspects of life in general – recency bias. According to the online skeptics dictionary, recency bias is, “the tendency to think that trends and patterns we observe in the recent past will continue in the future.”

I looked at all starting pitcher in the last 3 years who either:

  1. In the first week of July, had a RA9 (runs allowed per 9 innings) adjusted for park, weather, and opponent, that was at least 1 run higher than their mid-season (as of June 30) projection. In addition, these pitchers had to have a projected context-neutral RA9 of less than 4.00 (good pitchers).
  2. In the first week of July, had an adjusted RA9 at least 1 run lower than their mid-season projection. They also had to have a projection greater than 4.50 (bad pitchers).

Basically, group I pitchers above were projected to be good pitchers but had very poor results for around 3 months. Group II pitchers were projected to be bad pitchers despite having very good results in the first half of the season.

A projection is equivalent to estimating a player’s most likely performance for the next game or for the remainder of the season (not accounting for aging). So in order to test a projection, we usually look at that player’s or a group of players’ performance in the future. In order to mimic the real-time question, “How do we expect this pitcher to pitch today, I looked at the next 3 games performance, in RA9.

Here are the aggregate results:

The average RA9 from 2015-2017 was around 4.39.

Group I pitchers (cold first half) N=36 starts after first week in July

Season-to-date RA9 Projected RA9 Next 3 starts RA9
5.45 3.76 3.71

Group II Pitchers (hot first half) N=84 starts after first week in July

Season-to-date RA9 Projected RA9 Next 3 starts RA9
3.33 4.95 4.81

 

As you can see, the season-to-date context neutral (adjusted for park, weather and opponent) RA9 tells us almost nothing about how these pitchers are expected to pitch, independent of our projection. Keep in mind that the projection has the current season performance baked into the model, so it’s not that the projection is ignoring the “anomalous” performance, and somehow magically the pitcher reverts to somewhere around his prior performance.

Actually, two things are happening here to create these dissonant (within the context of recency bias) results: One, these projections are using 3 or 4 years of prior performance (including the minor leagues), if available, such that another 3 months, even the most recent 3 months (which gets more weight in our projection model), often doesn’t have much effect on the projection (depending on how much prior data there is). As well, even if there isn’t that much prior data the very bad or good 3-month performance is going to get regressed towards league average anyway.

Two, how much integrity is there in a very bad RA9 for a pitcher who was and is considered a very good pitcher, and vice versa? By that, I mean does it really reflect how well the pitcher has pitched in terms of the components allowed or was he just lucky or unlucky in terms of the timing of those events? We can attempt to answer that question by looking at our same pitchers above and see how their season-to-date RA9 looks compared to a a component RA9, which is an RA9 looking number constructed from a pitcher’s component stats (using a BaseRuns formula). Let’s add that to the charts above.

Group I

Season-to-date RA9 To-date component RA9 Projected RA9 Next 3 starts RA9
5.45 4.40 3.76 3.71

Group II

Season-to-date RA9 To-date component RA9 Projected RA9 Next 3 starts RA9
3.33 4.25 4.84 4.81

 

These pitchers’ component results were not nearly as bad or good as their RA9 suggests.

So, if a pitcher is still projected to be a good pitcher, even after a terrible first half (or vice versa), RA9-wise (and presumably ERA-wise), two things are going on to justify that projection: One, the first half may be a relatively small sample compared to 3 or 4 years prior performance – remember, everything counts (albeit recent performance is given more weight)! Two, and more importantly, that RA9 is mostly timing-driven luck. The to-date components suggest that both the hot and cold pitchers have not pitched nearly as badly or as well as their RA9 suggests. The to-date component RA9’s are around league-average for both groups.

The takeaway here is that your recency bias will cause to you reject these projections in favor of to-date performance as reflected in RA9 or ERA, when in fact the projections are still the best predictor of future performance.