Tuesday, May 08, 2007

Cricket averages

Statistics is an important part of cricket. Heck, it is an important part of life!

The most common statistic in cricket is the average, or, to be more precise the arithmetic mean. Averages apply for both bowlers and batsman, but it is more of a concern to batsman who are not as numptied in the head as bowlers. “My average” is the most important number in a batsman's small world; he is obsessed by it. You can tell how well someone is doing this season by how they talk about averages in general. If they emphasise the usefulness of averages, they scoring well, whereas if they’re having a stinker they seem indifferent and even aloof to blind statistical practices.

Being rubbish, I have always had doubts over statistics, especially since there is so many ways of calculating measurements of central tendency. Let’s put stats to the test.

Below is a histogram of Michael Atherton’s test career – one of my favourite cricketers.

First off, I’m afraid the complex mathematics involved in sorting out the not outs is far beyond me, so I shall assume all innings are complete. This gives us an arithmetic mean of 36 (which is not far off his true average of 37). The standard deviation is absolutely hopeless, given N, but I’m not sure that really applies to cricket.

However, you will notice that the normal distribution of the line graph has kurtosis, and is definitely slanted to the left. In such instances, working out the mean doesn’t always give an accurate middle value, and gives undue influence to large outliers.

So! What are the alternatives? Well, there are a number of incredibly complicated methods of working out means (generalised mean, harmonic mean, etc.) but I don’t begin to understand them. I can work out, however, three other GCSE mathmatical measurements: the mode (the most frequent value), the mid-point (that value between the lowest and highest x) and the median (the middle value). They are:

Mode: 0
Median: 23
Mid-point: 92.5

The mode is clearly useless. Yes, Athers got a lot of ducks (20), but we didn’t expect him to score naught every time. The Mid-point is a very dodgy way of working about central tendency and should be ignored. (Although, it is nice to dwell on a possible world where my hero averaged over 90.)

The median shows an interesting phenomenon. Although Atherton was regarded as one of the best batsman of his generation, in more than half of his innings he failed to meaningfully contribute. If, like me, you have ranked all his scores on excel and divide them into quartiles, then you will see that it is only the upper quartile that has anything over fifty.

In essence, it is a quarter of Atherton’s total innings that does the work for his average. If he returned to test cricket again, we should expect three quarters of all his innings to be a failure. And yet his average is nearly forty, this doesn’t seem right.

In hindsight, I should have analysed an Ozzie’s career, and said how he was really over-rated and averaged seven, or something. Maybe at another Ashes whitewash.

There was going to be another graph saying something brilliant. But I’m simply too exhausted by all the stats. Sorry. Just know that I cast doubt on the general averages-in-cricket direction. But not Athers. He’s a god. No. The God.

1 comment:

Unknown said...

G'Day left-arm Chinaman, great article here ; I really enjoyed it.

I've even written a response with that statistical analyses of an Australian batsmen you asked for..