The election result this week in Alberta, Canada confounded a lot of people and cast the art of polling in a bad light. The polls predicted that the Wildrose Party would win, but they were trounced by the PC Party. Two days before the election, one poll gave the Wildrose 38% versus the PCs 36%. For the previous week the polls averaged 41% for the Wildrose and 33% for the PCs, a lead of 8%. In the actual election results the winning Progressive Conservative Party had 44% to the Wildrose’s 34%, a loss for Wildrose by about 10%
Usually when a reputable poll is reported is goes along with the caveat “this poll is accurate to plus or minus 3% points (or some similar small number) 19 times out of 20.” In 24 polls before the election, not a single one came within 3% of the PC’s 44% final result, and only three came within 3% of the Wildrose’s 34% final result. The blue band in the diagram shows a rough ±3% band around the PC’s final result, and the green band around the Wildrose’s result.
Opinion polls in 2012 Alberta election. Source: Election Almanac.
What happened to the famous “19 times out of 20”? Is this the poll the 20^{th} time? How could it be, when for 24 polls out of 24, none of them came within 3% of the PC’s election result?
The outcome of this election is more surprising as the 1948 US presidential election, when all the pollsters predicted that Thomas E. Dewey would defeat Harry S. Truman. But Truman beat Dewey by 5%, much less than by the amount by which the PCs routed the Wildrose. In 1948, there was still a lot to learn about political opinion polling. They had had 64 years to hone their skills by the time of the Alberta election.
So what does the “19 times out of 20” mean? It means that in some future survey of the same population that uses the same survey method, there is a 95% probability (i.e. 19 out of 20) that the estimate from the survey will be within plus or minus three percent of the value in the whole population. Alternatively, it means that if you do a large number of surveys, and in each one you calculate the estimate, and then the average of these estimates, then there is a 95% probability that this average will be within plus minus 3% of the value in the population.
But there is two snags. Once the survey has been completed and the data collected and analysed, we cannot say whether estimate from this specific sample is within 3% of the value in the population. The reason is that you rarely get a sample that is representative of the population as a whole. A particular sample might be one percent off, or six percent off. It is usually very hard, if not impossible, to know.
Many people explain, incorrectly, the plus or minus number as follows: “if we did this same survey a gazillion times, and nobody changed their mind while we did it, about 5% of the time (one time out of 20) the proportion in the population and the proportion in the sample could differ by more than 3%.”
Of course nobody ever repeats a survey a gazillion times, or even intends to, so this explanation of the “plus or minus” number is mostly nonsense. And even if you could repeat it, answers to survey questions are not constants—and that’s the second snag.
In an election people change their minds. That is the whole purpose of a campaign. The 19-ties-out-of-20 rules works very well if you are sampling marbles from a jar and want to know what percentage are blue or green. But voters are not marbles (although politicians often talk about them as if they had lost their marbles). Voters constantly change.
So the “19 times out of 20” caveat at the end of every poll should really be “this poll is accurate to within plus or minus 3% 19 times out of twenty if the voters don’t change their minds. But they usually do.”