(I am taking a break from original posts due to the holidays and because of travel after that. Until I return, here are some old posts, updated and edited, for those who might have missed them the first time around. New posts should appear starting Monday, January 14, 2008.)
As the 2008 election season gets into high gear, we will get inundated with the results of opinion polls. Many of our public policies are strongly influenced by these polls, with politicians paying close attention to them before speaking out.
But while people are inundated with opinion polls, there is still considerable misunderstanding about how they work. Especially during elections, when there are polls practically every day, one often hears people expressing skepticism about polls, saying that they feel the polls are not representative because they, personally, and all the people they know, have never been asked their opinion. Surely, they reason, if so many polls are done, every person should get a shot at answering these surveys? That fact that no pollster has contacted them or their friends and families seem to make the poll results suspect in their eyes, as if the pollsters are using some highly selective group of people to ask and leaving out ‘ordinary’ people.
This betrays a misunderstanding of statistics and the sampling size needed to get good results. The so-called “margin of error” quoted by statisticians is found by dividing 100 by the square root of the size of the sample. So if you have a sample of 100, then the margin of error is 10%. If you have a sample size of 625, then the margin of error drops sharply to 4%. If you have a sample size of 1111, the margin of error becomes 3%. To get to 2% requires a sample size of 2500.
Clearly you would like your margin of error to be as small as possible, which argues for large samples, but your sample sizes are limited by the cost and time involved in surveying people, so trade offs have to be made. Most pollsters use samples of about 1000, and quote margins of error of 3%.
One interesting point is that there are statistical theorems that say that the sample size needed to get a certain margin of error does not depend on the size of the whole population (for large enough populations, say over 100,000). So a sample size of 1000 is sufficient for Cuyahoga County, the state of Ohio, or the whole USA. This explains why any given individual is highly unlikely to be polled. Since the population of the US is close to 300 million, any one of the 1000 people I may personally know has only a 0.00033% probability of being contacted.
We know that a poll tells us that 54% of Americans say that “I do not think human beings developed from earlier species.” The sample size was 1000, which means a margin of error of about 3%. Statistically, this means that there is a 95% chance that the “true” percentage of people who agree with that statement (i.e., the number we would get if could actually ask each and every person on the country) lies somewhere between 51% and 57%.
Certain assumptions and precautions go into interpreting these results. The first assumption is that the people polled are a truly random sample of the population. If you randomly contact people, that may not be true. You may, for example, end up with more women than men, or you may have contacted more old people or registered Republicans than are in the general population. If, from census and other data, you know the correct proportions of the various subpopulations in your survey, then this kind of skewing can be adjusted for by changing the weight of the contributions from each subgroup to match the actual population distribution.
With political polls, sometimes people complain that the sample sizes of Democrats and Republicans are not equal and that thus the poll is biased. But that difference is usually because the number of people who are officially registered as belonging to those parties are not equal.
But sometimes pollsters also quote the results for the subpopulations in their samples, and since the subsamples are smaller, the breakdown data has greater margin of error than the results for the full sample, though you are often not explicitly told this. For example, the above-mentioned survey says that 59% of people who had high school education or less agreed that “I do not think human beings developed from earlier species.” But the number of people in the sample who fit that description is 407, which means that there is a 5% uncertainty in the result for that subgroup, unlike the 3% for the full sample of 1000.
But a more serious source of uncertainty these days is that many people refuse to answer pollsters when they call and it is not possible to adjust for the views of those who refuse. So although the pollsters do have data on the numbers of persons who hang up on them or otherwise refuse to answer, they do not know if such people are more likely or less likely to think that humans developed from earlier species. So they cannot adjust for this factor. They have to simply assume that if those non-responders had answered, their responses would have been in line with those who actually did respond.
Then there may be people who do not answer honestly for whatever reason or are just playing the fool. They are also hard to adjust for. This is why I am somewhat more skeptical of surveys of teens on various topics. It seems to me that teenagers are just the right age to get enjoyment from deliberately answering questions in exotic ways.
These kinds of biases are hard, if not impossible, to compensate for, though in serious research the researchers try to put in extra questions that can help gauge whether people are answering honestly. But opinion polls, which have to be done quickly and cheaply, are not likely to go to all that trouble
Because of such reasons, polls like the Harris poll issue this disclaimer at the end:
In theory, with probability samples of this size, one could say with 95 percent certainty that the overall results have a sampling error of plus or minus 3 percentage points of what they would be if the entire U.S. adult population had been polled with complete accuracy. Sampling error for subsamples is higher and varies. Unfortunately, there are several other possible sources of error in all polls or surveys that are probably more serious than theoretical calculations of sampling error. They include refusals to be interviewed (nonresponse), question wording and question order, and weighting. It is impossible to quantify the errors that may result from these factors.
For all these reasons, one should take the quoted margins of error, which are based purely on sample size, with a considerable amount of salt.
There is one last point I want to make concerning a popular misconception propagated by news reporters during elections. If an opinion poll says that a sample of 1000 voters has candidate A with 51% support and candidate B with 49%, then since the margin of error (3%) is greater than the percentage of votes separating the candidates (2%), the reporters will often say that the race is a “statistical dead heat,” implying that the two candidates have equal chances of winning.
Actually, this is not true. What those numbers imply (using math that I won’t give here) is that there is about a 75% chance that candidate A truly does lead candidate B, while candidate B has only a 25% chance of being ahead. So when one candidate is three times as likely as the other to win, it is highly misleading to say that the race is a “dead heat.”
POST SCRIPT: Inflated value of religion
Many people have an inflated sense of the value of religion that simply falls apart on close examination. For example, Mike Huckabee said the following: “The Ten Commandments form the basis of most of our laws and therefore, you know if you look through them does anybody find anything there that would be all that objectionable? I don’t think most people would if they actually read them.”
He says this as if it is obviously true. But Ed Brayton shows how absurd this is.