People—and businesses—use averages all the time to measure performance. Consciously or not, we use averages to make inferences about some underlying property of a population. For example, if Pine Crest School has higher averages than River Bank School, we infer that the “typical” student from Pine Crest is better than the typical student from River Bank, or that the teaching is better. How reasonable those inferences are depends on the circumstances.
The trouble is that there is virtually a limitless number of ways to calculate averages. Consider stock market averages. Here are the returns for several broad US equity indexes in the ten years between April 2002 and April 2012:
Average | 10-yr return April 2002 to April 2012 |
Dow Jones Industrial Average | 33% |
Standard & Poors 500 | 30% |
Value Line Arithmetic Index | 129% |
NASDAQ | 84% |
Russell 3000 | 5% |
Wilshire 5,000 | 5% |
They are all indicators of the market. What inference about the stock markets can you make with each of these six very different averages? Value Line says “we believe that the Value Line (Arithmetic) Index is the best single estimate of ‘the’ U.S. stock market,” while Standard & Poor’s claims that their index is “widely regarded as the best single gage of the large cap U.S. equities market.” Wilshire says “no other index comes close to offering the comprehensiveness” of the Wilshire 5,000.
At this time of year, as high school final exams approach, many parents and their kids will be worried about final grade averages. Grade averages are like stock market averages. Teachers can calculate them just about any way they want. Suppose two science teachers each have different weighting schemes for calculating an average for the year, based on the proportion of class work, home work, quizzes, tests, lab reports and the final exam contribute to the final grade. It’s quite possible for a student to get a failing average with one weighting scheme and a passing average with another. Like science teachers, the different stock market indices apply different weighting schemes to the stocks in their indices.
How many key performance indicators used by business rely on averages? A quick Internet search on “KPI average” turns up such things as average number of training hours per employee, average time for response to a customer call, average number of calls per handler, average time to complete a task, and so on.
What makes averages differ is the structure and complexity in the data. If it is very simple data, averages will generally be similar. If there are structures, such as different groups or times, or different ways to weight the structures, there will be differences. Understand how the structure in the data influences the average. The structure may create subsets of data whose averages are very different than the overall average. The overall average will not be representative of the “typical” case in such situations, and you cannot make reasonable inferences with it.
This does not mean you cannot use averages. There are an infinite number of ways to calculate an average. Don’t be fooled when someone says they have calculated “the” average. Make sure it is the most reasonable average for summarizing the data or making an inference.
I fully agree that you need to be careful using averages (like a teacher explained with an example to us: I took 2 shots at a duck. One shot came one meter before him, the other one meter after him, so on average I killed the duck), but isn´t that true for any metric you use. If you do not define metrics as SMART as possible, they will also be worthless.