Monday, January 3, 2011

The Logit Model and Baseball: Projecting Chance of Making the Hall of Fame

It is the favorite time of year for many a sports nerd like myself: the time when the Baseball Writers Association of America will make their picks for the Hall of Fame, and when the blogosphere is best equipped to mock and ridicule the inconsistent logic of many esteemed writers.

It is also a favorite time for anyone who has ever said "I cannot believe they voted for...", or how someone was robbed (see Whitaker, Lou).

Last year's epic battle, in my mind, was the one about Edgar Martinez. Supporters cited his batting numbers that were comparable to legends while playing in a mediocre hitting park, and his career that shows no signs of PED use. His detractors cited that he was a Designated Hitter, and that his career is short.

When all was said and done, Martinez received a mere 36.2% of the vote, less than half what one needs to reach the Hall of Fame. So, what would Edgar's chance of reaching Cooperstown, knowing this?

Would you believe a 69.09% chance?

It seems counter-intuitive that when one is yet to convince almost two-thirds of the remaining voting base of his greatness, 6 years after his career ended, that anything would change so rapidly. However, it occurs constantly, as only 2 men from 1976-97 received a higher share of the vote on their first ballot and missed out on the writer's election. One of which, Jim Bunning, eventually gained access through the Veterans Committee.

As we saw from yesterday's post, the logit model can provide a powerful probability estimator given a dummy dependent variable. In this case, we test whether someone reached the Hall of Fame (y=1) or not (y=0).

To perform this analysis, I looked at all Hall of Fame votes from 1976-1997, and took the percent share of the vote obtained by all players on their first ballot, excluding those who received less than 5% of the vote (indicating a probability of being elected to the Hall of 0, and a small chance of being elected by the veterans' committee). Through this process, I obtained a data sheet of 59 players, as can be seen here.

Right away, one can make general assumptions. Thirty-three of the fifty-nine players listed were eventually elected to the Hall, a 55.93% success rate. Additionally, 3 more were elected by the veterans' committee, leaving the total success rate of the group at 61.02%. Simply clearing the first obstacle of making it past the first ballot seems to bode well for the eventual success of candidates.

However, this analysis is imperfect. The success rate includes players who were elected on the first ballot, and had no resistance in making the Hall. Once again, though, we can easily run a logit model regression on the data. Yesterday I showed a sample script that can run all the needed commands for you, but today I'll simply enter these in, command by command, and show you another useful feature: creating a function.

Yesterday I showed you the format for writing a logit model regression, so let's skip right to a summary. This regresses share of the first ballot vote to probability of eventually making the Hall of Fame:

> summary(probabilityHOF)

Formula: hof ~ 1/(1 + exp(-1 * (A + B * initial)))

Parameters:
Estimate Std. Error t value Pr(>|t|)
A -3.0108 0.7995 -3.766 0.000395 ***
B 10.5387 3.0206 3.489 0.000942 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2891 on 57 degrees of freedom

Number of iterations to convergence: 6
Achieved convergence tolerance: 8.11e-06

And this, in turn, summarizes the probability of being elected by the BBWAA:

> summary(probHOFbbwaa)

Formula: hof_bbwaa ~ 1/(1 + exp(-1 * (A + B * initial)))

Parameters:
Estimate Std. Error t value Pr(>|t|)
A -3.5384 0.8171 -4.330 6.1e-05 ***
B 10.4765 2.5601 4.092 0.000136 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2479 on 57 degrees of freedom

Number of iterations to convergence: 19
Achieved convergence tolerance: 5.858e-06

First thing to stand out is that the equations are very similar. The coefficients are strong on both regressions, though a bit stronger on the second one, measuring whether or not a player would be elected by the sports writers or not.

This is fine if we want to hard code, but we would rather not: we would rather simply enter a value in question and have R return a probability for us. Well this can be done simply using a function.

As shown at the Depauw website, writing a function is performed by:
name=function(argument 1, argument 2, etc...) { [expression] }

Since we have the coefficients available at our disposal from our previous equations, let's define functions. First, a function to determine the probability of being elected into the Hall of Fame by the sports writers:

> xChance=function(x) {
+ 1/(1+exp(-1*(-3.5384+10.4765*x))) }

Then, a second one, to determine the probability of reaching the Hall via the BBWAA or the Veterans' Committee:

> xHOF=function(x) {
+ 1/(1+exp(-1*(-3.0108+10.5387*x))) }

So of course, the last question is, how do you use a function? Well, that's the simplest part of all, and will help us obtain our answer. All you need to do is write in this format:
FnName(value)

So, let's try both of these equations where x=.362, or Edgar Martinez's share of votes in 2010:

> xChance(0.362)
[1] 0.5631837
> xHOF(0.362)
[1] 0.6908742

Well, these are probability moderately encouraging for my fellow Edgar fans. Not even including the Veterans Committee option, Edgar currently stands as better than a coin flip chance of reaching the Hall, at 56.32%. With the Veterans Committee, this probability spikes to 69.09%.

So where are the break-even (50-50) points for both equations? For just the BBWAA vote, it is at around 33.8% of the initial vote. For overall Hall of Fame chances, it is at around 28.6%.

After the fifth, we can re-visit this post and project the chances of ballot newcomers like Jeff Bagwell and Larry Walker. For now, rest assured that Barry Larkin has a 91.9% chance of reaching Cooperstown, and Robbie Alomar is almost a stone cold lock. Heck, even Fred McGriff has a 32.2% chance of reaching the Hall someday given his first ballot performance.

No comments:

Post a Comment