Category: Glossary

Advanced Stats

9/22/2015

The last post on basic stats ended up being longer than I might have first planned, but it was fun to say some things about them anyway. In any case, I'm going to go lighter on advanced stats because they're harder to explain in some cases, and more details are already available by the people who created them in most cases. But a few of them are particularly useful, and I might throw them around when comparing players, so it's worth quickly explaining what they are, and why use them, and what some good numbers are. As mentioned previously, most information is likely to come from Fangraphs because they tend to go deep on a lot of things.

WAR - Wins Above Replacement. This stat has caused some confusion over the years since it came into existence, and is sometimes mocked by "old-school" types. At its core, I think it's a really good stat. What it attempts to quantify is how many Wins a player is worth to his team vs a replacement level player - probably an average bench player or starter from the AAA minor league level. For position players, they try to quantify their offensive, defensive and baserunning value, with appropriate adjustments based on position (for example, a first baseman who produces identical numbers offensively to a shortstop will be less valuable because more first basemen are good at hitting). Catchers have their defense quantified differently (and more on that later). Pitchers are mostly evaluated just on their pitching.
There is a lot of math that goes into this calculation, and it is calculated differently by the three major statistical websites: Fangraphs, Baseball Reference, and Baseball Prospectus. The differences are seen most often among the pitchers, and feel free to read up on differences. The shortest version of that difference is that Fangraphs calculates performance based on what the numbers say should have happened (more on that in FIP), where BR looks more at what did happen with weight put into what "should have". Sound confusing enough?
There are umpteen articles written on this subject and whether it is good, bad, or otherwise. Let's go short though: higher WAR values are better, BUT there is enough adjustment needed in the data that it is not advised to make much of decimal differences. For example, a player seen to have 5.6 WAR might have been a bit better than one with 5.4, but they're realistically too close to accurately tell the difference. The 5.6 guy has probably performed at a considerably higher level than somebody with a 3.4 WAR, however.
Also, it is a cumulative stat (some discussed later will be rate stats). That means that playing all year gives more value than missing a lot of time. A good starting pitcher who pitches 200 innings is probably going to be a good 2-3 times more valuable than a reliever who performs the same over 70-80 innings. There is also discussion each year over how much "one win" is worth on the free agent market. Fangraphs breaks this down slightly more, but look at it this way:
Less than 0: probably going to get replaced (or should)
0-2: Relief pitcher, possibly 5th starter, bench guy or role player
2-4: Solid starting player.
4-6: All-star type player
6 and above: MVP contender (or Cy Young).
For further reference, a team at 0 WAR by Fangraphs calculations should win about 48 games out of 162 (.294 winning percentage). Which is terrible. So you need approximately 33 WAR among your players to get to .500, about 42 to get to 90 wins (and a probable playoff spot), and 52 would make you a 100-win team (which are few and far between).
Fore more reference, the Blue Jays have about 30 wins from their position players, and 14 from their pitchers. The WAR calculation says they should be about 85-55. The standings say 86-64. All in all, not bad.

OK, that was a lot on the first thing. The rest will be quick, promise.

Offense
OBP - On Base Percentage - fixes a lot of what's wrong with batting average. In simplest terms, it is how many plate appearances end up with the batter on base (not giving them credit for getting on base when it caused somebody else to be forced out on the bases though, for example). An OBP of .400 means you reached base in 40 percent of your plate appearances. Below .300 is very bad, .300-.333 is just playable if you have power and/or defense, .333-.367 is adequate, .367-.400 is good to very good. Above .400 is excellent, and there are rare birds who clear .500 (and one man, one year, who cleared .600. That's not a typo).
Slugging Percentage - Same problems as batting average in terms of "hits" and "at bats", but gives increased credit for power. That is, it's "total bases" divided by "at bats". Values of up to 4.000 are possible (though never achieved outside of a single game, obviously). Below about .350 is not good, .350-.400 is ok depending on other factors, .400-.450 is pretty good, .450-.500 very good, above .500 pretty excellent. The same man who cleared .600 in OBP cleared .800 in this stat twice. He is one of only two players to ever do so.
OPS* - On Base Plus Slugging. An early attempt at an advanced stat added the last two together. There are two major problems with this: 1) they don't have the same denominator, and 2) it pretty heavily underrates on base percentage, which is generally more valuable than slugging. But people like it because it is a bit helpful I guess. Again, below .600 is bad, .600-.700 is probably a role player, .700-.800 ranges from average to above average, .800-.900 is pretty excellent, and above .900 could get you some MVP votes.
wOBA - Weighted On Base Average - A long equation which attempts to properly do what the above one does by giving some weight to extra base hits, and these weights change slightly depending on the offensive environment of the season. The values you want are pretty much like the OBP ones (though you won't generally see any .600s).
wRC and wRC+ - Weighted Runs Created and adjusted wRC. Again some gory math, but this one basically uses wOBA from above to estimate how many runs a players offense created compared to the league and replacement level, also taking into account the stadium they play most of their games in (park factors, see below). From WAR, 10 runs is worth roughly 1 win. The wRC+ formula (and other similar ones like OPS+) basically compares the wRC to the league average. Which is to say, a value of 100 means exactly average, 150 means 50% better than average, 200 (rare) means twice the average.
ISO - Isolated Power. Simple stat subtracts batting average from slugging percentage to look just at the extra bases a batter got. Higher means more power. Below .100 is bad, .100-.150 is below average, .150-.200 is good to very good, .200-.250 is great, and above is pretty exceptional.
BABIP - Batting Average on Balls in Play. If you remember from last section, we expect that most balls that are put in play will turn into hits about 30% of the time. In play in this case does not mean home runs, strikeouts, walks, being hit by a pitch. A player with an number well above .300 may be due for regression down, and one well below .300 may be due for regression up. There are hitters though who show consistent ability to have high numbers here due to speed, hitting a lot of line drives, and so on. There is a lot more that can go into that, but that's a good high level view of it. Note that a lot of this is true for pitchers too, though they tend to be less able to be way off the average than some hitters.
RC - Runs Created - Joe Posnanski again takes a deep dive on this one, but basically Bill James came up with this about 40 years ago, and it still works pretty well. The basic formula is
[(H+W) * (TB)] / (AB + W). It will be a similar number to wRC, but without the weighting for the run environment and all of that. On a macro level (and stats like these are great), Joe notes that " if you go back to 1950, the basic runs created formula has estimated that 1,126,591 runs would be scored. And, over those 65 years, teams have actually scored just 3,695 more runs than that" which gives it an accuracy of about 99.5%.

Defense
UZR - Ultimate Zone Rating. Combines a bunch of factors to show how many runs a player saves or gives up compared to a replacement player at their position. Most values range between -15 and +15, and the loosest breakdown suggests that -15 to -5 is very bad to below average, -5 to +5 is around average, +5 to +15 is above average to very good, and above +15 is excellent.
DRS - Uses some different measures and weights to measure basically the same thing as UZR, and with the same breakdown of tiers, basically.
Catcher Framing - A fun thing to measure, but a good argument for the need for automated umpiring. Heavy research in the age of high definition video and pitch F/X and the like have shown that some catchers are able to get strikes that the average catcher would have called a ball, and some go the other way and actually cost some strikes. The value of additional strikes seems to be huge, and is an argument for catchers having pretty immense value, but maybe a false one. The king of this is Jose Molina.

Pitching
FIP - Fielding Independent Pitching. As mentioned a couple of times, it's now believed and mostly accepted that pitchers don't have much control over balls in play. This weights their home runs, walks, and strikeouts, then adds a factor to make the number pretty close to ERA. Numbers that are good and bad are basically the same as ERA, something to look at is how it compares to their ERA. If it is a lot higher, then their "peripherals" (you may hear that a lot) suggest that their ERA is going to get worse. If it is a lot lower, then it suggests the opposite.
xFIP - Expected FIP. Similar to above, but looks at fly ball rate. This basically expects that giving up more fly balls will mean more home runs, so it changes the home run value based on how many fly balls the pitcher gives up. This should be close to their FIP, but if not, it may be because the pitcher is giving up very few home runs per fly ball, or more than expected.
LOB% - Percentage of runners left on base. Pretty much what it sounds like. If a guy gets on base, how well did the pitcher keep him from scoring. League average is usually around 70-72%, and significant deviation from this often suggests that they will regress. Caveats are that pitchers that get a lot of strikeouts may be naturally above this level, and pitchers that are "not major league caliber" may just be naturally below this level.
HR/FB - Home runs per fly ball. Another that's pretty much what it sounds like. This can be affected by the park the pitcher pitches in, and there are a very (very) few pitchers that seem to be better or worse than expected at this. Average tends to be in the range of 10%, but there are some guys who outperform that.
ERA- and FIP- - Basically what wRC+ and OPS+ do, but the opposite because for these stats, less is more. An ERA- of 75 is 25 percent better than league average, 100 is league average, etc.
SIERA - Skill Interactive ERA. A fairly new one from Fangraphs, attempts to improve on FIP and xFIP by integrating things like ground ball rate, fly ball rate, and so on. The scale is pretty similar again to FIP and ERA, below 3 is good, above 5 is bad, below 2 is excellent. The math is long and gory, but if you see it, just look at the overall number and compare it to ERA.

The bottom line is, a lot of these statistics are trying to show how well a player has performed "skill-wise", focusing on process vs results. And by looking at the process and comparing to the results, we might get an idea of what future results will be, better or worse or roughly the same.

Multi-Area
WPA - Win Probability Added. Over the many thousands of games that have been played, many many individual situations have occurred in terms of scores, baserunners and outs. In any situation, a team has a certain likelihood of winning the game. Any change in the situation changes this probability, and the pitchers and hitters who contribute are credited accordingly. Here's an example situation (all numbers made up). Score is tied 3-3 with nobody out in the top of the 5th inning. Both teams currently have a 50% chance of winning. The first batter hits a home run to give his team a 4-3 lead. Now with a 4-3 lead, nobody out, nobody on base, top of the 5th, his team has a 60% chance of winning. He gets +0.100 WPA, and the pitcher gets -0.100 WPA. Watching the graphs on Fangraphs shows the swings that can happen in games. The changes tend to be much more extreme the later the game gets if the scores are close. This stat is not predictive, but it does give a good idea of what happened.
Park Factors - Mentioned a bit in the stadium post. Basically, different stadiums for different types of hitters and different types of hits. Heavy sea air may make fly balls not fly as far, short right field walls may make more home runs for left-handed hitters, etc. The park factors are usually weighted on a 100 scale, so that any park factor of 100 is exactly average, 110 is 10% above average, 90 is 10% below average, and so on. Taking a look at team statistics with this in mind can give a better idea of what their players are actually doing.
RE24 - Run Expectancy with 24 base-out states. Basically there are 24 different states that the bases and outs can be in (ranging from 0 out, 0 on to 2 out, 3 on and every possibility in between). Based on the league environment, each of these states has a different number of expected runs going forward (based on the thousands of games that have come before, mostly). There are various stats that are based on performance with these states in mind. Quick example: 0 on, 0 out expects about 0.46 runs. 0 on, 1 out expects about 0.24 runs, dropping by 0.22. Runner on 1st, 0 out expects about 0.83 runs, going up about 0.37 runs. Overall, the swing between getting the leadoff runner to first base or him being out changes that inning by about 0.6 runs.
Leverage Index - From Fangraphs, and using a couple of the recent stats: "You take the current base-out state, inning, and score and you find the possible changes in Win Expectancy that could occur during this particular plate appearance. Then you multiple those potential changes by the odds of that potential change occurring, add them up, and divide by the average potential swing in WE to get the Leverage Index."
Relief pitchers tend to be used in higher leverage situations if they are of high quality, because managers tend to want them in tough situations.

Well that's all for now. Probably a lot to read, but if you like it, it could be useful later.

0 Comments

Basic Stats - A quick primer

9/22/2015

0 Comments

Because I am the way I am and I enjoy stats and analysis, I feel like it's worth giving a bit of a primer on some of the stats that I'm likely to refer to. I do this mostly because my very limited audience of friends and family may not know or care nearly as much about most of these as I do, but if I'm throwing them into things I write, it may be helpful to know what the heck I'm talking about.

For more full detail about advanced player stats, if you want to go deep, check out Fangraphs especially, though Baseball Reference, Baseball Prospectus, and others are good for information. I put that there because if you want to dive deep, that's where you should go instead of me re-hashing it. I just want to quickly explain some of the more useful or at least more frequently used ones.

Also note, I'll reference Joe Posnanski a lot. He writes exceptionally well about many sports and a lot that isn't about sports, and a lot of my information and perspective has been influenced by his writing. If you want to dig deeper on something, read him. The man can really write.
Also read or follow Grant Brisbee, Keith Law, Joe Sheehan, Rany Jazayerli, and Jonah Keri.

First - Baseball Card Statistics
For a long time, the stats that most fans and even sportswriters used were basic ones that showed up in newspaper box scores or on the back of baseball cards. A couple of these are useful, many are really not. I'll give a quick rundown of what and why. These are also commonly used in simple scoring system fantasy sports leagues.

Hitters
HR - Home runs. Nothing wrong with this as a statistic. Not always anything predictive, but with a few exceptions, it's counting balls that are hit over the fence. For most of the period between 1920 and 2015, more than 20 is pretty good, more than 30 is very good, more than 40 is great, and more than 50 is exceptional. The period of about 1996-2006 skewed these results a bit, as there were 18 players to reach 50 home runs vs 18 in the 75 seasons before and 4 in the almost 9 seasons since. With modern players in the last 10 years, there are statistics available including percentage of fly balls that become home runs and breakdowns into categories of No Doubt, Just Enough, or Lucky, even looks at how many stadiums under non-windy conditions a ball hit at a certain speed and angle would have actually been a home run in. This, for example, would normally have been a home run in 0. This one might have left Yellowstone.
R - Runs scored. For an individual player, this is borderline useless. Don't believe what old-school announcers and writers might say. This is so much a function of who bats behind the hitter and under so little control for the hitter. On balance, a player who gets on base more (more on that later), may score more runs. A player who hits home runs is at least guaranteed to score 1 run per home run. Runs scored means a lot as a team statistic, and next to nothing as an individual one.
Quick illustration by way of digression #1 - Player A hits a single. Player B grounds into a forceout. Player C hits a home run. Player B gets a run scored. What did he do besides erase player A?
RBI - Runs Batted In. Kind of the same but opposite as above. The only guaranteed way to get an RBI is to hit a home run. But due to an arcane rule, there is actually a way to knock in a run that does not get you an RBI. If a run scores when you ground into a double play, they do not award the RBI. On a fly ball out or a single-out groundout, yes.
Basically, players who hit a lot of home runs tend to get more RBI, because they're at least guaranteeing one per home run. But it's so much a function of how many runners are on base when they bat. Clutch hitting has been largely disproven with a very very few players who are probably outliers. Most players are as good in one situation as they are in another given enough opportunities. After all, they're all professionals.
In both runs and RBI, the magic number seems to be 100.
SB - Another counting stat. This one is fun but not all that meaningful. When power was lower in the 1970s and 1980s, stolen bases were very popular as a way to get into scoring position. As home runs went up, they became less popular, because you don't want to risk getting thrown out when anybody after you could hit a home run. Go look at Rickey Henderson's career stats for some fun times. Nowadays probably stealing more than 30 bases in a season will get you noticed as a pretty fast guy.
In any case, pitchers and catchers pay more attention to this now than they used to, but players are also more careful about when they run, so overall opportunities are down in general. There is something of a break-even rate at around 70% success where a player should or should not try to run. If he is successful more than that, he is generally helping his team. If he is not, he is generally hurting his team. This relates to run expectancy, more on that later.
AVG - Still popular, definitely overrated. The chief problem with this is what it does and doesn't count. It is a simple measure of the percentage of at bats that end with the player getting a hit. But to paraphrase Bill Clinton, a lot of that depends on what your definition of the word "is" is.
A hit is: a single, double, triple or home run. A hit is also: a single where you get thrown out running to second, a double where you get thrown out running to third. A hit is not: a walk, getting hit by a pitch, , a sacrifice fly, a bunt, or reaching base when a fielder misplayed the ball in the eyes of the official scorer (errors - yes, they're subjective, but like Clarence Thomas, we often know them when we see them).
An at bat is: something ending in a single, double, triple, home run, strikeout, ground out, or fly out unless that fly ball scores a run, an unsuccessful sacrifice bunt, or a time when you reached base when the fielder misplayed the ball (see just above). An at bat is not: something ending in a walk, being hit by a pitch, a successful sacrifice bunt, a sacrifice fly.
For more analysis, Joe Posnanski writes about this (and most things) much better than I can right now, but another quick digression may help with the point.
Player A - comes to the plate 20 times. He hits 2 home runs (2-run and 3-run), 1 single, 6 walks, 1 sacrifice fly, 4 strikeouts, 3 groundouts, 2 flyouts, and 1 time reached on an error.
Player B - comes to the plate 20 times. He hits 6 singles, has 0 walks, 4 strikeouts, 6 groundouts, 4 flyouts.
Batting average says Player A hit .231, and Player B hit .300. Sounds simple, right?
On base percentage (coming later) will suggest that Player A made 11 outs (it's also not sure how to deal with errors, but there are further advances on that), while player B made 14.
If you're hearing debate about it, the most basic levels are this: less than .200: unplayable. .200-.250: you'd better be a very good fielder or have lots of power. .250-.300: solid enough at most fielding, speed, or power profiles. .300-.350: very good (though as above, can be empty). .350 and above: probably having a pretty great season.

Pitchers
W - Wins. Again, read Joe Posnanski for further analysis (he also is probably where I picked up the A vs B comparison). But the problem with wins and losses attributed to pitchers especially in the modern game is that they're very incomplete. To win a game as a starting pitcher, you have to complete at least 5 innings, and finish pitching or be replaced with your team holding the lead, and they have to keep that lead. To lose a game, you have to have given up the run that gave the other team the lead that they did not relinquish. Samples below:
Player A - 9 innings, 2 hits, 1 run, 10 strikeouts. Team scores 0 runs. Player A gets the L.
Player B - 5 innings, 12 hits, 7 runs, 5 walks, 1 strikeout. Team scores 12 runs. Player B gets the L.
Player C - 8 innings, 2 hits, 0 runs, 12 strikeouts. Leaves with his team holding a 3-0 lead. His replacement allows 3 runs to blow the save (oh we'll get there), but holds the tie. His team scores a run to win in the bottom of the 9th. His replacement gets the win. Player C gets nothing.
Player D - enters the game in the 6th inning with the score tied and with 2 outs, throws 1 pitch to get the out. Team scores 5 times in the next half inning, and a new pitcher comes in, and his team wins. Player D's one pitch earns him the win.
There are dozens of examples like this, but just take some time to consider it I guess.
One last point is that attributing wins and losses to a pitcher made sense in the early days of baseball when most pitchers completed most of their games. Now most pitchers go around 6 innings for many (increasingly intelligent, but still occasionally flawed) reasons.
People who like these stats count 20 in a season as being an accomplishment, and 300 in a career as being hall-of-fame worthy.
Sv - Saves. This was a stat created by sportswriter Jerome Holtzman in 1959. It's not bad in and of itself, but it doesn't mean a lot. And it has some weird quirks to it. Per Wikipedia, a save is earned when:
"He satisfies one of the following conditions: He enters the game with a lead of no more than three runs and pitches for at least one inning. He enters the game, regardless of the count, with the potential tying run either on base, at bat or on deck. He pitches for at least three innings."
A note, some fun can be had with the last condition in there. The Littleton was coined when Wes Littleton earned a save for "preserving" the lead for 3 innings in a game with a final score of 30-3. To be fair to him, when he entered, the score was "only" 14-3.
Closers and specialized bullpens started to ramp up in the mid-1980s with Tony La Russa, and now many teams carry at least 7 relief pitchers for reasons which may be ranted about at a later date. Again from Mr. Posnanski though, we find that this has made little to no difference. From his story, these are winning percentages of teams that have led going into the 9th inning in different decades:
1950s - .948
1960s - .946
1970s - .948
1980s - .951
1990s - .949
2000s - .954
2010s - .952
Read that last link to really go in depth. The big problem with saves is that it pigeonholes pitchers and managers. They decide that certain pitchers, their "closers", can only pitch in the 9th inning with a lead. If a more important situation comes up in the 7th or 8th, or if the game is tied in the 9th, somebody else will have to do it, nevermind if your "closer" is your best pitcher in the bullpen.
Most pitchers who can last the whole season in a "closer" role will probably pick up at least 30 saves.
K - Strikeouts - These are fine, though kind of cumulative. If you pitch 160 innings you'll probably have less strikeouts than if you pitch 200 innings or if you get to 240 innings. Strikeouts have also been going way up in recent years as pitchers pitch with maximum effort for shorter periods of time, and batters have realized that strikeouts are ok if swinging hard may net them more hard contact when they do get it. Percentage of batters struck out or strikeouts per 9 innings gives a bit more of a complete picture, but over a full season, looking at two pitchers who threw the same number of innings with very different numbers of strikeouts will at least tell you something.
Most relievers now can get over 1 strikeout per inning, and some starters get that high as well. Over 200 in a season is pretty good, and it's been a while since anybody hit over 300.
ERA - Earned Run Average. The first one involving some more math, the basic formula is 9 times number of earned runs divided by innings pitched, which simplifies to number of earned runs per 9 innings pitched. This stat is ok, but has a couple of flaws. The first is the "earned" portion. The link in the Wins section goes into some detail about the foolishness of errors, but the gist is that if a runner reaches base or scores because of what the scorer deems an error, it "doesn't count" to the pitcher. The second is that it doesn't account for bullpen usage in particular. If a pitcher loads the bases, gets two outs, and leaves for a relief pitcher, various things can happen.
1 - The replacement gets the next batter out. No earned runs for anybody.
2 - The replacement allows a triple, then gets the next batter out. 3 earned runs for the first pitcher, none for the replacement.
And so on. A slightly simpler version would be RA or RA9 as it's sometimes called, which just charges the runs, earned or not, to the pitcher, but again this doesn't figure out that last situation. There are statistics that do, but not for this section.
Also in a future section, I'll talk more about FIP, or fielding-independent pitching. The gist is that Voros McCracken, a baseball researcher, discovered that by and large, a pitcher can only really control 3 things: strikeouts, walks, and home runs. There turns out to be a little bit more to it, but not a lot. By and large, ball put in play (that is not one of those situations) will on average be a hit around 30% of the time. Knuckleballers seem to get around that a bit, some guys do a bit worse, and some guys do a bit better for various reasons (hitters seem to have some more variance here over their careers, and guys who allow more ground balls or fly balls will have some different results). There's a lot more to explore there. So I won't for this particular section.
Simple levels again: below 2.00 is excellent; 2.00-3.00 is very good. 3.00-4.00 is fine for a starter, getting to be less fine for an important reliever. 4.00-5.00 is tolerable for a 4th or 5th starter or a mopup guy in the bullpen. Above 5.00 is probably going to be an issue going forward.
WHIP - Walks + Hits per Inning Pitched. An early "advanced" stat that was created a bit more for fantasy leagues, the formula is entirely in the name. For the reasons just mentioned, it's not always greatly predictive (the hits part anyway), but it will probably tell you a good amount about the results that have happened, including ERA.
Below 1.00 is pretty exceptional. 1.00-1.20 is very good, 1.20-1.40 is in the average range. 1.40-1.60 is pretty below average. Higher than 1.60 will probably get you in a lot of trouble.

Next time, I'll try some advanced stats out. For now know that these are still talked about pretty commonly, but fortunately a lot of TV broadcasts and most websites go further than this, so I will next time as well.

0 Comments

Greg Jackson

A baseball fan in general. Interested in statistics and analytics. Usually follow the Giants and Blue Jays, fan of all MLB in general.
On Twitter @sittinontop

RSS Feed

Advanced Stats

Basic Stats - A quick primer

Archives

Categories

Greg Jackson

Quick Survey