Media | MGL on Baseball

Archive for the ‘Media’ Category

Why WAR is a terrible metric for an MVP discussion

Posted: August 15, 2015 in Awards, Batting, Media

Recently there has been some discussion about the use of WAR in determining or at least discussing an MVP candidate for position players (pitchers are eligible too for MVP, obviously, and WAR includes defense and base running, but I am restricting my argument to position players and offensive WAR). Judging from the comments and questions coming my way, many people don’t understand exactly what WAR measures, how it is constructed, and what it can or should be used for.

In a nutshell, offensive WAR takes each of a player’s offensive events in a vacuum, without regard to the timing and context of the event or whether that event actually produced or contributed to any runs or wins, and assigns a run value to it, based on the theoretical run value of that event (linear weights), adds up all the run values, converts them to theoretical “wins” by dividing by some number around 10, and then subtracts the approximate runs/wins that a replacement player would have in that many PA. A replacement player produces around 20 runs less than average for every 650 PA, by definition. This can vary a little by defensive position and by era. And of course a replacement player is defined as the talent/value of a player who can be signed for the league minimum even if he is not protected (a so-called “freely available player”).

For example, let’s say that a player had 20 singles, 5 doubles, 1 triple, 4 HR, 10 non-intentional BB+HP, and 60 outs in 100 PA. The approximate run values for these events are .47, .78, 1.04, 1.40, .31, and -.25. These values are marginal run values and by definition are above or below a league average position player. So, for example, if a player steps up to the plate and gets a single, on the average he will generate .47 more runs than 1 generic PA of a league average player. These run values and the zero run value of a PA for a league average player assume the player bats in a random slot in the lineup, on a league average team, in a league average park, against a league-average opponent, etc.

If you were to add up all those run values for our hypothetical player, you would get +5 runs. That means that theoretically this player would produce 5 more runs than a league-average player on a league average team, etc. A replacement player would generate around 3 fewer runs than a league average player in 100 PA (remember I said that replacement level was around -20 runs per 650 PA), so our hypothetical player is 8 runs above replacement in those 100 PA.

The key here is that these are hypothetical runs. If that player produced those offensive events while in a league average context an infinite number of times he would produce exactly 5 runs more than an average player would produce in 100 PA and his team would win around .5 more games (per 100 PA) than an average player and .8 more games (and 8 runs) than a replacement player.

In reality, for those 100 PA, we have no idea how many runs or wins our player contributed to. On the average, or after an infinite number of 100 PA trials, his results would have produced an extra 5 runs and 1/2 win, but in one 100 PA trial, that exact result is unlikely, just like in 100 flips of a coin, exactly 50 heads and tails is an unlikely though “mean” or “average” event. Perhaps 15 or those 20 singles didn’t result in a single run being produced. Perhaps all 4 of his HR were hit after his team was down by 5 or 10 runs and they were meaningless. On the other hand, maybe 10 of those hits were game winning hits in the 9th inning. Similarly, of those 60 outs, what if 10 times there was a runner on third and 0 or 1 out, and our player struck out every single time? Alternatively, what if he drove in the runner 8 out of 10 times with an out, and half the time that run amounted to the game winning run? WAR would value those 10 outs exactly the same in either case.

You see where I’m going here? Context is ignored in WAR (for a good reason, which I’ll get to in a minute), yet context is everything in an MVP discussion. Let me repeat that: Context is everything in an MVP discussion. An MVP is about the “hero” nature of a player’s seasonal performance. How much did he contribute to his team’s wins and to a lesser extent, what did those wins mean or produce (hence, the “must be on a contending team” argument). Few rational people are going to consider a player MVP-quality if little of his performance contributed to runs and wins no matter how “good” that performance was in a vacuum. No one is going to remember a 4 walk game when a team loses in a 10-1 blowout. 25 HR with most of them occurring in losing games, likely through no fault of the player? Ho-hum. 20 HR, where 10 of them were in the latter stages of a close game and directly led to 8 wins? Now we’re talking possible MVP! .250 wOBA in clutch situations but .350 overall? Choker and bum, hardly an MVP.

I hope you are getting the picture. While there are probably several reasonable ways to define an MVP and reasonable and smart people can legitimately debate about whether it is Trout, Miggy, Kershaw or Goldy, I think that most reasonable people will agree that an MVP has to have had some – no a lot – of articulable performance contributing to actual, real-life runs and wins, otherwise that “empty WAR” is merely a tree falling in the forest with no one to hear it.

So what is WAR good for and why was it “invented?” Mostly it was invented as a way to combine all aspects of a player’s performance – offense, defense, base running, etc. – on a common scale. It was also invented to be able to estimate player talent and to project future performance. For that it is nearly perfect. The reason it ignores context is because we know that context is not part of a player’s skill set to any significant degree. Which also means that context-non-neutral performance is not predictive – if we want to project future performance, we need a metric that strips out context – hence WAR.

But, for MVP discussions? It is a terrible metric for the aforementioned reasons. Again, regardless of how you define MVP caliber performance, almost everyone is in agreement that it includes and needs context, precisely that which WAR disdains and ignores. Now, obviously WAR will correlate very highly with non-context-neutral performance. That goes without saying. It would be unlikely that a player who is a legitimate MVP candidate does not have a high WAR. It would be equally unlikely that a player with a high WAR did not specifically contribute to lots of runs and wins and to his team’s success in general. But that doesn’t mean that WAR is a good metric to use for MVP considerations. Batting average correlates well with overall offensive performance and pitcher wins correlate well with good pitching performance, but we would hardly use those two stats to determine who was the better overall batter or pitcher. And to say, for example, that Trout is the proper MVP and not Cabrera because Trout was 1 or 2 WAR better than Miggy, without looking at context, is an absurd and disingenuous argument.

So, is there a good or at least a better metric than WAR for MVP discussions? I don’t know. WPA perhaps. WPA in winning games only? WPA with more weight for winning games? RE27? RE27, again, adjusted for whether the team won or lost or scored a run or not? It is not really important what you use for these discussions by why you use them. It is not so much that WAR is a poor metric for determining an MVP. It is using WAR without understanding what it means and why it is a poor choice for an MVP discussion in and of itself, that is the mistake. As long as you understand what each metric means (including traditional mundane ones like RBI, runs, etc.), how it relates to the player in question and the team’s success, feel free to use whatever you like (hopefully a combination of metrics and statistics) – just make sure you can justify your position in a rational, logical, and accurate fashion.

HOW JETER CAN TURN A VERY GOOD WRITER INTO A HACK *

Posted: February 16, 2014 in Media, Uncategorized

* And why I am getting tired of writers and analysts picking and choosing one or more of a bushel of statistics to make their (often weak) point.

Let’s first get something out of the way:

Let’s say that you know of this very good baseball player. He is well-respected and beloved on and off the field, he played for only one, dynastic, team, he has several World Series rings, double digit All-Star appearances, dozens of awards, including 5 Gold Gloves, 5 Silver Sluggers, and a host of other commendations and accolades. Oh, and he dates super models and doesn’t use PEDs (we think).

Does it matter whether he is a 40, 50, 60, 80, or 120 win (WAR) player in terms of his HOF qualifications? I submit that the answer is an easy, “No, it doesn’t” He is a slam dunk HOF’er whether he is indeed a very good, great, or all-time, inner-circle, great player. If you want to debate his goodness or greatness, fine. But it would be disingenuous to debate that in terms of his HOF qualifications. There are no serious groups of persons, including “stat-nerds,” whose consensus is that this player does not belong in the HOF.

Speaking of strawmen, before I lambaste Mr. Posnanski, which is the crux of this post, let me start by giving him some major props for pointing out that this article, by the “esteemed” and “venerable” writer Allen Barra, is tripe. That is Pos’ word – not mine. Indeed, the article is garbage, and Barra, at least when writing about anything remotely related to sabermetrics, is a hack. Unfortunately, Posnanski’s article is not much further behind in tripeness.

Pos’ thesis, I suppose, can be summarized by this, at the beginning of the article:

[Jeter] was a fantastic baseball player. But you know what? Alan Trammell was just about as good.

Here are Alan Trammell’s and Derek Jeter’s neutralized offensive numbers.

Trammell: .289/.357/.420
Jeter: .307/.375/..439

Jeter was a better hitter. But it was closer than you might think.

He points out several times in the article that, “Trammell was almost as good as Jeter, offensively.”

Let’s examine that proposition.

First though, let me comment on the awful argument, “Closer than you think.” Pos should be ashamed of himself for using that in an assertion or argument. It is a terrible way to couch an argument. First of all, how does he know, “What I think?” And who is he referring to when he says, “You?” The problem with that “argument,” if you want to even call it that, is that it is entirely predicated on what the purveyor decides “You are thinking.” Let’s say a player has a career OPS of .850. I can say, “I will prove that he is better than you think, assuming of course that you think that he is worse than .850, and it is up to me to determine what you think.” Or I can say the opposite. “This player is worse than you think, assuming of course, that you think that he better than an .850 player. And I am telling you that you are thinking that (or at least implying that)!”

Sometimes it is obvious what, “You think.” Often times it is not. And that’s even assuming that we know who, “You” is. In this case, is it obvious what, “You think of Jeter’s offense compared to Trammell?” I certainly don’t think so, and I know a thing or two about baseball. I am pretty sure that most knowledgeable baseball people think that both players were pretty good hitters overall and very good hitters for a SS. So, really, what is the point of, “It was closer than you think.” That is a throwaway comment and serves no purpose other than to make a strawman argument.

But that is only the beginning of what’s wrong with this premise and this article in general. He goes on to state or imply two things. One, that their “neutralized” career OPS’s are closer than their raw ones. I guess that is what he means by “closer than you think,” although he should have simply said, “Their neutralized offensive stats are closer than their non-neutralized ones,” rather than assuming what, “I think.”

Anyway, it is true that in non-neutralized OPS, they were 60 points apart, whereas once “neutralized,” at least according to the article, the gap is only 37 points, but:

Yeah, it is closer once “neutralized” (I don’t know where he gets his neutralized numbers from or how they were computed ), but 37 points is a lot man! I don’t think too many people would say that a 37 point difference, especially over 20-year careers, is “close.”

More importantly, a big part of that “neutralization” is due to the different offensive environments. Trammell played in a lower run scoring environment than did Jeter, presumably, at least partially, because of rampant PED use in the 90’s and aughts. Well, if that’s true, and Jeter did not use PED’s, then why should we adjust his offensive accomplishments downward just because many other players, the ones who were putting up artificially inflated and gaudy numbers, were using? Not to mention the fact that he had to face juiced-up pitchers and Trammell did not! In other words, you could easily make the argument, and probably should, that if (you were pretty sure that) a player was not using during the steroid era, that his offensive stats should not be neutralized to account for the inflated offense during that era, assuming that that inflation was due to rampart PED use of course.

Finally, with regard to this, somewhat outlandish, proposition that Jeter and Trammell were similar in offensive value (of course, it depends on your definition of “similar” and “close” which is why using words like that creates “weaselly” arguments), let’s look at the (supposedly) context-neutral offensive runs or wins above replacement (or above average – it doesn’t matter what the baseline is when comparing players’ offensive value) from Fangraphs.

Jeter

369 runs batting, 43 runs base running

Trammell

124 runs batting, 23 runs base running

Whether you want to include base running on “offense” doesn’t matter. Look at the career batting runs. 369 runs to 124. Seriously, what was Posnanski drinking (aha, that’s it – Russian vodka! – he is in Sochi in case you didn’t klnow) when he wrote an entire article mostly about how similar Trammell and Jeter were, offensively, throughout their careers. And remember, these are linear weights batting runs, which are presented as “runs above or below average” compared to a league-average player. In other words, they are neutralized with respect to the run-scoring environment of the league. Again, with respect to PED use during Jeter’s era, we can make an argument that the gap between them is even larger than that.

So, Posnanski tries to make the argument that, “They are not so far apart offensively as some people might think (yeah, the people who look at their stats on Fangraphs!),” by presenting some “neutralized” OPS stats. (And again, he is claiming that a 37-point difference is “close,” which is eminently debatable.)

Before he even finishes, I can make the exact opposite claim – that they are worlds apart offensively, by presenting their career (similar length careers, by the way, although Jeter did play in 300 more games), league and park adjusted batting runs. They are 245 runs, or 24 wins, apart!

That, my friends, is why I am sick and tired of credible writers and even some analysts making their point by cherry picking one (or more than one) of scores of legitimate and semi-legitimate sabermetric and not-so-sabermetric statistics.

But, that’s not all! I did say that Posnanski’s article was hacktastic, and I didn’t just mean his sketchy use of one (not-so-great) statistic (“neturalized” OPS) to make an even sketchier point.

This:

By Baseball Reference’s defensive WAR Trammell was 22 wins better than a replacement shortstop. Jeter was nine runs worse.

By Fangraphs, Trammell was 76 runs better than a replacement shortstop. Jeter was 139 runs worse.

Is an abomination. First of all, when talking about defense, you should not use the term “replacement” (and you really shouldn’t use it for offense either). Replacement refers to the total package, not to one component of player value. Replacement shortstops, could be average or above-average defenders and awful hitters, decent hitters and terrible defenders, or anything in between. In fact, for various reasons, most replacement players are average or so defenders and poor hitters.

And then he conflates wins and runs (don’t use both in the same paragraph – that is sure to confuse some readers), although I know that he knows the difference. In fact, I think he means “nine wins” worse in the first sentence, and not, “nine runs worse.” But, that mistake is on him for trying to use both wins and runs when talking about the same thing (Jeter and Trammell’s defense), for no good reason.

Pos then says:

You can buy those numbers or you can partially agree with them or you can throw them out entirely, but there’s no doubt in my mind that Trammell was a better defensive shortstop.

Yeah, yada, yada, yada. Yeah we know. No credible baseball person doesn’t think that Trammell was much the better defender. Unfortunately we are not very certain of how much better he was in terms of career runs/wins. Again, not that it matters in terms of Jeter’s qualifications for, or his eventually being voted into, the HOF. He will obviously be a first-ballot, near-unanimous selection, and rightfully so.

Yes, it is true that Trammell has not gotten his fair due from the HOF voters, for whatever reasons. But, comparing him to Jeter doesn’t help make his case, in my opinion. Jeter is not going into the HOF because he has X number of career WAR. He is going in because he was clearly a very good or great player, and because of the other dozen or more things he has going for him that the voters (and the fans) include, consciously or not, in terms of their consideration. Even if it could be proven that Jeter and Trammell had the exact same context-neutral statistical value over the course of their careers, Jeter could still be reasonably considered a slam dunk HOF’er and Trammell not worthy of induction (I am not saying that he isn’t worthy). It is still the Hall of Fame (which means many different things to many different people) and not the Hall of WAR or the Hall of Your Context-Neutral Statistical Value.

For the record, I love Posnanski’s work in general, but no one is perfect.

	Successfully Wild Ep… on Pinch-hitter, DH, and Other “P…
	Effectively Wild Epi… on Pinch-hitter, DH, and Other “P…
	Effectively Wild Epi… on Pinch-hitter, DH, and Other “P…
	Effectively Wild Epi… on Pinch-hitter, DH, and Other “P…
	Effectively Wild Epi… on Pinch-hitter, DH, and Other “P…

MGL on Baseball

Follow Blog via Email

Recent Posts

Recent Comments

Archives

Categories

Meta

Blogroll

Follow me on Twitter

Archive for the ‘Media’ Category

Why WAR is a terrible metric for an MVP discussion

HOW JETER CAN TURN A VERY GOOD WRITER INTO A HACK *

Follow Blog via Email