Reading these two posts on hitters and pitchers got me thinking. Those are both pretty number heavy, so if you're into that, read them, but if not, I'm going to do some basic summarizing here anyway.
The main thing analytics and sabermetrics have been trying to do for some time is to be able to predict player performance. That really is their aim, in a nutshell. Analyzing what has happened is fine, but it can't be changed now. The question is, what does that mean going forward. It's the point of looking at a guy who has an ERA of 3.00 now, but seems to have been getting a little too lucky, and guessing that over the rest of the season, he's more likely to play to his true skill. He might not, but that's the safe bet. Avoiding the hot hand or cold hand fallacy, and all of that good stuff. A site like Fangraphs includes projections for all players, using a number of different systems, including Steamer, ZiPS, and fan consensus (where readers of the site input their expectations. These often end up pretty close to reality, though that could be that Fangraphs self-selects for analytically minded fans, or it could be (as we'll see) that this isn't necessarily as hard as it seems, on average. The posts I linked to up top compared the performance of 3 fairly complex projection tools and one very simple one with what really happened. They try to focus on what they deem "skill stats" rather than counting stats, as counting stats can be skewed by playing time. For hitters they have BB%, K%, OBP, SLG, and wOBA. For pitchers they have K/9, BB/9, HR/9, and ERA. Before going further, lets digress and give as simple a breakdown as I can on the 3 complex systems, and then on the simple one. First, Steamer, which is a proprietary system that looks mostly at the players last 3 years of stats, weighting each one differently, regressing them in different amount, and mostly ignoring aging. The focus is mostly on rate statistics. Second, Pecota a system developed by Nate Silver for Baseball Prospectus. It compares players to other players in history who had similar profiles to that point in their careers including production metrics, usage metrics, "phenotypic attributes" (things like handedness, height, weight, career length, etc), and fielding position. Third, ZiPS, which again uses about 3 years of player data (4 for hitters who aren't very old or very young) and has specific aging curves for different player types, and again different weights. And fourth, but not at all least, one of the original systems, on which a lot of Steamer and ZiPS and others are kind of based, is Marcel. It is named for Marcel the Monkey, from Friends, and is meant to be "so easy, a monkey could do it". The principle behind it is that it "uses 3 years of MLB data, with the most recent year weighted the most heavily. It regresses to the mean and has an age factor." The amount of regression to the mean depends on how much they have played over the last 3 years (for example, a hitter who has over 600 plate appearances each of the last 3 years should be more predictable based on his own statistics than one who has around 200 each of the last 3 years). For rookies it predicts them to be exactly league average, period. It also does not consider comparable players, park factors, or anything else. Explaining the exact math would probably get a little bit tedious, but you could do it mostly in a pretty simple Excel spreadsheet using no complicated equations. So the complicated ones should be the best, right? Turns out... not necessarily. In the articles at the top, they looked at the projections for 2015, and looked at the average error for the stats I mentioned earlier. Steamer was, by a little bit, the most accurate in every category for both pitchers and hitters. But not by that much. Second best at 8 of the 9 statistics above? Marcel the Monkey. So it turns out that if you look at how a player has played, and assume that it will be pretty close to the last 3 years, especially last year, you'll probably do pretty well, on average. Important note - and this is brought up in most articles about projections. Most projections will not do very well for any individual player. The process is too random, injuries can happen, etc. But if you tried out something like projecting all of the players on a team and looking at the team's totals? You'd probably come out to something pretty reasonable. But if you like to guess at how players will do, check out the monkey system. It might be a fun way to look at some teams...
0 Comments
Leave a Reply. |
Archives
November 2015
Categories
All
Greg JacksonA baseball fan in general. Interested in statistics and analytics. Usually follow the Giants and Blue Jays, fan of all MLB in general. |