Do managers know things that we don’t (part 3)?

Posted: October 22, 2014 in Bullpen Management, In-game strategy, Managers, Pitching

In response to my two articles on whether pitcher performance over the first 6 innings is predictive of their 7th inning performance (no), a common response from saber and non-saber leaning critics and commenters goes something like this:

No argument with the results or general method, but there’s a bit of a problem in selling these findings. MGL is right to say that you can’t use the stat line to predict inning number 7, but I would imagine that a lot of managers aren’t using the stat line as much as they are using their impression of the pitcher’s stuff and the swings the batters are taking.

You hear those kinds of comments pretty often even when a pitcher’s results aren’t good, “they threw the ball pretty well,” and “they didn’t have a lot of good swings.”

There’s no real way to test this and I don’t really think managers are particularly good at this either, but it’s worth pointing out that we probably aren’t able to do a great job capturing the crucial independent variable.

That is actually a comment on The Book Blog by Neil Weinberg, one of the editors of Beyond the Box Score and a sabermetric blog writer (I hope I got that somewhat right).

My (edited) response on The Book Blog was this:

Neil I hear that refrain all the time and with all due respect I’ve never seen any evidence to back it up. There is plenty of evidence, however, that for the most part it isn’t true.

If we are to believe that managers are any good whatsoever at figuring out which pitchers should stay and which should not, one of two things must be true:

1) The ones who stay must pitch well, especially in close games. That simply isn’t true.

2) The ones who do not stay would have pitched terribly. In order for that to be the case, we must be greatly under-estimating the TTO penalty. That strains credulity.

Let me explain the logic/math in # 2:

We have 100 pitchers pitching thru 6 innings. Their true talent is 4.0 RA9. 50 of them stay and 50 of them go, or some other proportion – it doesn’t matter.

We know that those who stay pitch to the tune of around 4.3. We know that. That’s what the data say. They pitch at the true talent plus the 3rd TTOP, after adjusting for the hitters faced in the 7th inning.

If we are to believe that managers can tell, to any extent whatsoever, whether a pitcher is likely to be good or bad in the next inning or so, then it must be true that the ones who stay will pitch better on the average then the ones who do not, assuming that the latter were allowed to stay in the game of course.

So let’s assume that those who were not permitted to continue would have pitched at a 4.8 level, .5 worse than the pitchers who were deemed fit to remain.

That tells us that if everyone were allowed to continue, they would pitch collectively at a 4.55 level, which implies a .55 rather than a .33 TTOP.

Are we to believe that the real TTOP is a lot higher than we think, but is depressed because managers know when to take pitchers out such that the ones they leave in actually pitch better than all pitchers would if they were all allowed to stay?

Again, to me that seems unlikely.

Anyway, here is some new data which I think strongly suggests that managers and pitching coaches have no better clue than you or I as to whether a pitcher should remain in a game or not. In fact, I think that the data suggest that whatever criteria they are using, be it runs allowed, more granular performance like K, BB, and HR, or keen, professional observation and insight, it is simply not working at all.

After 6 innings, if a game is close, a manager should make a very calculated decision as far as whether or not he should remove his starter. That decision ought to be based primarily on whether the manager thinks that his starter will pitch well in the 7th and possibly beyond, as opposed to one of his back-end relievers. Keep in mind that we are talking about general tendencies which should apply in close games going into the 7th inning. Obviously every game may be a little different in terms of who is on the mound, who is available in the pen, etc. However, in general, when the game is close in the 7th inning and the starter has already thrown 6 full, the decision to yank him or allow him to continue pitching is more important than when the game is not close.

If the game is already a blowout, it doesn’t matter much whether you leave in your starter or not. It has little effect on the win expectancy of the game. That is the whole concept of leverage. In cases where the game is not close, the tendency of the manager should be to do whatever is best for the team in the next few games and in the long run. That may be removing the starter because he is tired and he doesn’t want to risk injury or long-term fatigue. Or it may be letting his starter continue (the so-called “take one for the team” approach) in order to rest his bullpen. Or it may be to give some needed work to a reliever or two.

Let’s see what managers actually do in close and not-so-close games when their starter has pitched 6 full innings and we are heading into the 7th, and then how those starters actually perform in the 7th if they are allowed to continue.

In close games, which I defined as a tied or one-run game, the starter was allowed to begin the 7th inning 3,280 times and he was removed 1,138 times. So the starter was allowed to pitch to at least 1 batter in the 7th inning of a close game 74% of the time. That’s a pretty high percentage, although the average pitch count for those 3,280 pitcher-games was only 86 pitches, so it is not a complete shock that managers would let their starters continue especially when close games tend to be low scoring games. If a pitcher is winning or losing 2-1 or 3-2 or 1-0 or the game is tied 0-0, 1-1, 2-2, and the starter’s pitch count is not high, managers are typically loathe to remove their starter. In fact, in those 3,280 instances, the average runs allowed for the starter through 6 innings was only 1.73 runs (a RA9 of 2.6) and the average number of innings pitched beyond 6 innings was 1.15.

So these are presumably the starters that managers should have the most confidence in. These are the guys who, regardless of their runs allowed, or even their component results, like BB, K, and HR, are expected to pitch well into the 7th, right? Let’s see how they did.

These were average pitchers, on the average. Their seasonal RA9 was 4.39 which is almost exactly league average for our sample, 2003-2013 AL. They were facing the order for the 3rd time on the average, so we expect them to pitch .33 runs worse than they normally do if we know nothing about them.

These games are in slight pitcher’s parks, average PF of .994, and the batters they faced in the 7th were worse than average, including a platoon adjustment (it is almost always the case that batters faced by a starter in the 7th are worse than league average, adjusted for handedness). That reduces their expected RA9 by around .28 runs. Combine that with the .33 run “nick” that we expect from the TTOP and we expect these pitchers to pitch at a 4.45 level, again knowing nothing about them other than their seasonal levels and attaching a generic TTOP penalty and then adjusting for batter and park.

Surely their managers, in allowing them to pitch in a very close game in the 7th know something about their fitness to continue – their body language, talking to their catcher, their mechanics, location, past experience, etc. All of this will help them to weed out the ones who are not likely to pitch well if they continue, such that the ones who are called on to remain in the game, the 74% of pitchers who face this crossroad and move on, will surely pitch better than 4.45, which is about the level of a near-replacement reliever.

In other words, if a manager thought that these starters were going to pitch at a 4.45 level in such a close game in the 7th inning, they would surely bring in one of their better relievers – the kind of pitchers who typically have a 3.20 to 4.00 true talent.

So how did these hand-picked starters do in the 7th inning? They pitched at a 4.70 level. The worst reliever in any team’s pen could best that by ½ run. Apparently managers are not making very good decisions in these important close and late game situations, to say the least.

What about in non-close game situations, which I defined as a 4 or more run differential?

73% of pitchers who pitch through 6 were allowed to continue even in games that were not close. No different from the close games. The other numbers are similar too. The ones who are allowed to continue averaged 1.29 runs over the first 6 innings with a pitch count of 84, and pitched an average of 1.27 innings more.

These guys had a true talent of 4.39, the same as the ones in the close games – league average pitchers, collectively. They were expected to pitch at a 4.50 level after adjusting for TTOP, park and batters faced. They pitched at a 4.78 level, slightly worse than our starters in a close game.

So here we have two very different situations that call for very different decisions, on the average. In close games, managers should (and presumably think they are) be making very careful decision about whom to pitch in the 7th, trying to make sure that they use the best pitcher possible. In not-so-close games, especially blowouts, it doesn’t really matter who they pitch, in terms of the WE of the game, and the decision-making goal should be oriented toward the long-term.

Yet we see nothing in the data that suggests that managers are making good decisions in those close games. If we did, we would see much better performance from our starters than in not-so-close games and good performance in general. Instead we see rather poor performance, replacement level reliever numbers in the 7th inning of both close and not-so-close games. Surely that belies the, “Managers are able to see things that we don’t and thus can make better decisions about whether to leave starters in or not,” meme.

Let’s look at a couple more things to further examine this point.

In the first installment of these articles I showed that good or bad run prevention over the first 6 innings has no predictive value whatsoever for the 7th inning. In my second installment, there was some evidence that poor component performance, as measured by in-game, 6-inning FIP had some predictive value, but not good or great component performance.

Let’s see if we can glean what kind of things managers look at when deciding to yank starters in the 7th or not.

In all games in which a starter allows 1 or 0 runs through 6, even though his FIP was high, greater than 4, suggesting that he really wasn’t pitching such a great game, his manager let him continue 78% of the time, which was more than the 74% overall that starters pitched into the 7th.

In games where the starter allowed 3 or more runs through 6 but had a low FIP, less than 3, suggesting that he pitched better than his RA suggest, managers let them continue to pitch just 55% of the time.

Those numbers suggest that managers pay more attention to runs allowed than component results when deciding whether to pull their starter in the 7th. We know that that is not a good decision-making process as the data indicate that runs allowed have no predictive value while component results do, at least when those results reflect poor performance.

In addition, there is no evidence that managers can correctly determine who should stay and who to pull in close games – when that decision matters the most. Can we put to rest, for now at least, this notion that managers have some magical ability to figure out which of their starters has gas left in their tank and which do not? They don’t. They really, really, really don’t.

Advertisement
Comments
  1. Guy says:

    MGL, you need to adjust your expectations for all these pitchers to account for the 6 innings in the game being analyzed. For a “dealing” starter, your expectation for inning 7 will be something like .12 RA9 worse than his season total, simply because you have removed 6 good innings from that same seasonal total. Those pitching badly in innings 1-6 will tend to overperform their season rate for the same reason.

    *

    While I generally agree with our conclusion here, I think you are making one unwarranted assumption, which is that managers have a sophisticated understanding of the TTO effect, and what that means for the relative talent of starters vs relievers in inning 7. I’m not sure that understanding is there, and therefore, it doesn’t follow that a manager’s decision to leave his starter in necessarily means he believes that starter will exceed (your) expectations. In fact, if you told most MLB managers that it had been scientifically proven that these starters who pitched well in innings 1-6 would pitch “exactly as they normally do” in inning 7, I bet that wouldn’t change their decision in many cases! And that’s because they don’t fully accept that their 3rd-best reliever has a better projection at that point than their starter.

    • MGL says:

      Right about the adjustment for removing dealing innings from the season totals. I saw that on The Book blog. Of course that doesn’t apply if the first 6 innings is not better than their season averages, although if a starter does pitch 6 innings, that means that he has been better than his normal self simply because we are removing those times that he got shelled and didn’t make it through 6. Anyway, good point which I failed to account for.

      I never said that managers have a good understanding of the TTOP. In fact, I have said that I think they don’t. Of course they don’t. If I implied that with the sentence you quoted that was simply poor wording in that sentence on my part. I am the last person in the world who would think that managers have a good unsderstanding of any sabermetric principle!

  2. jss says:

    Does any of this change for removal of pitchers in earlier innings?

    • MGL says:

      I don’t know. Remember we are not so much looking at removal as we are of non-removal. In the early innings, almost no starters are removed. So what are going to look at? We don’t know anything about what starters would have done had they not been removed.

  3. Eric M. Van says:

    It amuses me that anyone thinks that managers are actually good at this, after watching managers manage. While managers routinely (and correctly) lift pitchers after giving up a cheap hit in the 7th, they almost never lift them after giving up a hard-hit out, if the pitcher has the platoon advantage with the next hitter. So we can actively see them paying no attention to the quality of opposing PAs.

    One thing you could do that would be very interesting is to use the PBP data and UZR engine to identify pitchers who gave up one or more cheap hits in the 6th, but nothing hard-hit, versus pitchers who gave up one or more line drive outs, but no hits. I’m sure you’ll find that more of the latter were allowed to pitch the 7th than the former, which is backwards. And it would be interesting to know if either was predictive, for the pitchers who were allowed to continue.

  4. Interesting article! I mostly agree, because this aligns well with the saber rule that a team’s record in 1-run games will regress to zero, i.e. .500 winning percentage.

    However, my research has found that Bruce Bochy has a very good W/L record in one run games, where he is roughly 4-games above .500 in one-run games, and one would think that being able to do this would entail knowing when to remove his SP and insert his RP, as you note here in this article. So I wonder if there are outliers or not (I don’t know if this is right to do or not, but I did hypothesis test on his record in one-run games assuming .500 hypothesis and found that it was statistically significant (at that time), barely, at the 95%) like, perhaps Bochy.

    Unfortunately, you analyzed the AL and not the NL (and I can see why, with the DH, pitching decisions are aligned with manager’s decision on the pitcher only, not about replacing the pitcher with a PH), so you can’t just pull out Bochy’s results. Any thoughts on this?

  5. […] to post about catchers not catching Link to Russell’s research about cruising pitchers Link to MGL’s research about cruising pitchers Link to MGL’s research about quick hooks Link to Pedro’s recent comments on Game 7 Link […]

  6. […] cutout Link to the game the family from New Zealand saw Link to article about reducing broken bats Link to analysis of managers pulling pitchers Link to analysis of game-to-game spin-rate variability Link to FiveThirtyEight hot hand analysis […]

  7. […] cutoutLink to the game the family from New Zealand sawLink to article about reducing broken batsLink to analysis of managers pulling pitchersLink to analysis of game-to-game spin-rate variabilityLink to FiveThirtyEight hot hand analysisLink […]

  8. […] Link to the game the family from New Zealand pictured Link to article about reducing broken at-bats Link to analysis of administrators pulling pitchers Link to analysis of game-to-game spin-rate variability Link to FiveThirtyEight hot hand analysis […]

  9. […] on the TTO penalty Link to earlier research on the TTO penalty Link to Russell on the TTO penalty Link to research on pulling pitchers Link to Twitter thread on pulling Snell Link to Nate Silver on capping pitchers Link to FanGraphs […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s