Thursday, May 20, 2010

The (Out) Law of Averages

A couple of years ago, as part of my statistics course, we were given an article on statistical fallacies in sports. We were then asked to come up with a different statistical fallacy we had found in a sport. It wasn't too difficult to find one. Here's what I wrote and the statistical evidence to back it up.

One of the biggest clich├ęs in the sporting world is the Law of Averages. While there is a real theorem that a random variable will reflect its underlying probability over a large sample (The Law of Large Numbers), the law of averages typically assumes that unnatural short-term "balance" will occur. Television commentators, including former cricketers often claim a batsman is “due” as he hasn’t scored a “big one” for a while.

In order to test this “Law of Averages”, the stats of the 15 batsmen with the most number of innings were studied (Consider it roughly equivalent to the number of plate appearances in baseball). The reason for choosing these 15 was that they provided the largest samples for study. I calculated the number of innings each of them took to score a hundred and used it as a benchmark to test the law of average, rounding it up to N. For example, Allan Border scored 27 100’s in 267 innings, which roughly came out to be a hundred every 9.9 innings. Hence, according to the law of averages, if Border hasn’t scored a 100 in 9 consecutive innings, his tenth (Nth) inning should be a 100. I then calculated the probability of this event using the formula:

P’ = N2/(N1+ N2), where
P’ = Probability of scoring a 100 in nth inning after (N-1) consecutive scores of less than 100
N1=No. of instances when batsman didn’t score a 100 in ‘N’ consecutive innings.
N2=No. of instances when batsman scored a 100 after ‘N-1’ consecutive scores of less than 100.
The P-Factor is a measure of how frequently a batsman scored when he was “due”. A P-Factor of more than 1 indicates a batsman delivered more frequently when he was “due” as compared to his career record. Table 1 shows the analysis of the sample. Of the 15 batsmen sampled, 12 have a P-Factor of less than 1.1, 7 of which have a P-Factor less than 1! Seems to me that even some of the best batsmen don’t fare any better when they were “due”, isn’t it?


Player
Innings
100's
Innings/100's
N
N1
N2
P
P'
P-Factor
Allan Border
267
27
9.889
10
87
9
0.101
0.094
0.931
Steve Waugh
260
32
8.125
9
92
8
0.123
0.08
0.65
Sachin Tendulkar
237
39
6.077
7
65
13
0.165
0.167
1.012
Alec Stewart
235
15
15.667
16
86
6
0.064
0.065
1.016
Brian Lara
232
34
6.824
7
69
14
0.147
0.169
1.15
Graham Gooch
219
20
10.95
11
63
7
0.091
0.1
1.099
Sunil Gavaskar
214
34
6.294
7
58
11
0.159
0.159
1
Michael Atherton
212
16
13.25
14
62
4
0.075
0.061
0.813
Mark Waugh
209
20
10.45
11
46
8
0.096
0.148
1.542
Rahul Dravid
205
24
8.542
9
64
9
0.117
0.123
1.051
David Gower
204
18
11.333
12
69
6
0.088
0.08
0.909
Desmond Haynes
202
18
11.222
12
62
6
0.089
0.088
0.989
Inzamam-ul-Haq
200
24
8.333
9
49
6
0.12
0.109
0.908
Jacques Kallis
194
29
6.69
7
68
9
0.149
0.117
0.785
Geoffrey Boycott
193
22
8.773
9
55
8
0.114
0.127
1.114

No comments:

Post a Comment