Sunday, October 19, 2008

Probably Fun - Baseball Stats

The other day a coworker who is very interested/involved in baseball statistics posed an interesting question to me just before I left work:

What is the probability of a team playing in a 5 game series to win the series on a streak of 3 wins?

(e.g. team A sweeps team B and wins the first 3, A loses 1 and then wins 3, or A loses 2 and wins final 3)

I quickly had flash backs of college to my statistics, discrete mathematics, number theory and finite automata courses (all favorites other than statistics). We quickly worked through some of the simpler aspects of it and came up with an answer that seemed well reasoned but not quite right. Of course I couldn't stop thinking about why it was wrong and went home and did further calculations, searched for my old statistics book, and even consulted Big Red. I eventually found the right answer but couldn't give myself the proof of why it was right.

The next day, after consulting with several other coworkers (at one point at least 15 people were busy discussing the problem and getting zero work done) and getting a well timed email from Big Red with his thoughts we were settled on a right answer.

I still can't put together a very clever or clear proof of why this solution is true, and I feel that there is probably a much succinct formula I am confident that it is correct .... a Monte Carlo simulation produces the exact same probability. :-)

Our answer, abstracted a bit:

The probability of a particular team winning a standard playoff series (where once a team wins more than half the games, the series is over) where the probability of the team winning each game is P and the number of games needed to win are X is:

PX * ( (1-p)0 + (1-p)1 + (1-p)2 + ... + (1-p)(X-1) )
so for a 5 game series where the odds are even:
0.53 * ( (1-0.5)0 + (1-0.5)1 + (1-0.5)2 )

0.125 * ( (0.5)0 + (0.5)1 + (0.5)2 )

0.125 * ( 1 + 0.5 + 0.25 )

0.125 * 1.75

0.21875
The probability of a particular team winning 3 games in a row in a 5 game series is 21.875% assuming each game is a 50/50 shot.

Without going into even more detail, the confusion was the result of trying to simplify the problem into "There are X outcomes possible and Y of them are the ones we want and since there is even probability of the games the total probability is Y/X" However, since various possible outcomes involve a different number of games, they are actually weighted differently and thus its not 30% but noticeably less at ~ 22%.

I have to thank Pip for the shout-out (see comment) on Fungoes.net (the STL SABR chapter blog).
Pip is a coworker who posed the original question.

No comments: