# Opinion polls and statistics

In the previous post and in many aspects of life these days, we get quoted the results of opinion polls. Many of our public policies are strongly influenced by these polls, with politicians paying close attention to them before speaking out.

But while people are inundated with opinion polls, there is still considerable misunderstanding about how they work. Especially during elections, when there are polls practically every day, one often hears people expressing skepticism about polls, saying that they feel the polls are not representative because they, personally, and all the people they know, have never been asked their opinion. Surely, they reason, if so many polls are done, every person should get a shot at answering these surveys? That fact that no pollster has contacted them or their friends and families seem to make the poll results suspect in their eyes, as if the pollsters are using some highly selective group of people to ask and leaving out ‘ordinary’ people.

This betrays a misunderstanding of statistics and the sampling size needed to get good results. The so-called “margin of error” quoted by statisticians is found by dividing 100 by the square root of the size of the sample. So if you have a sample of 100, then the margin of error is 10%. If you have a sample size of 625, then the margin of error drops sharply to 4%. If you have a sample size of 1111, the margin of error becomes 3%. To get to 2% requires a sample size of 2500.

Clearly you would like your margin of error to be as small as possible, which argues for large samples, but your sample sizes are limited by the cost and time involved in surveying people, so trade offs have to be made. Most pollsters use samples of about 1000, and quote margins of error of 3%.

One interesting point is that there are statistical theorems that say that the sample size needed to get a certain margin of error does not depend on the size of the whole population (for large enough populations, say over 100,000). So a sample size of 1000 is sufficient for Cuyahoga County, the state of Ohio, or the whole USA. This explains why any given individual is highly unlikely to be polled. Since the population of the US is close to 300 million, the probability of any one of the 1000 people I may personally know being contacted has only a 0.00033% chance.

We know that a poll tells us that 54% of Americans say that “I do not think human beings developed from earlier species.” The sample size was 1000, which means a margin of error of about 3%. Statistically, this means that there is a 95% chance that the “true” number of people who agree with that statement lies somewhere between 51% and 57%.

Certain assumptions and precautions go into interpreting these results. The first assumption is that the people polled are a truly random sample of the population. If you randomly contact people, that may not be true. You may, for example, end up with more women than men, or you may have contacted more old people or registered Republicans than are in the general population. If, from census and other data, you know the correct proportions of the various subpopulations in your survey, then this kind of skewing can be adjusted for by changing the weight of the contributions from each subgroup to match the actual population distribution.

With political polls, sometimes people complain that the sample sizes of Democrats and Republicans are not equal and that thus the poll is biased. But that difference is usually because the number of people who are officially registered as belonging to those parties are not equal.

But sometimes pollsters also quote the results for the subpopulations in their samples, and since the subsamples are smaller, the breakdown data has greater margin of error than the results for the full sample, though you are often not explicitly told this. For example, the above-mentioned survey says that 59% of people who had high school education or less agreed that “I do not think human beings developed from earlier species.” But the number of people in the sample who fit that description is 407, which means that there is a 5% uncertainty in the result for that subgroup, unlike the 3% for the full sample of 1000.

But a more serious source of uncertainty these days is that many people refuse to answer pollsters when they call and it is not possible to adjust for the views of those who refuse. So although the pollsters do have data on the numbers of persons who hang up on them or otherwise refuse to answer, they do not know if such people are more likely or less likely to think that humans developed from earlier species. So they cannot adjust for this factor. They have to simply assume that if those non-responders had answered, their responses would have been in line with those who actually did respond.

Then there may be people who do not answer honestly for whatever reason or are just playing the fool. They are also hard to adjust for. This is why I am somewhat more skeptical of surveys of teens on various topics. It seems to me that teenagers are just the right age to get enjoyment from deliberately answering questions in exotic ways.

These kinds of biases are hard, if not impossible, to compensate for, though in serious research the researchers try to put in extra questions that can help gauge whether people are answering honestly. But opinion polls, which have to be done quickly and cheaply, are not likely to go to all that trouble

Because of such reasons, polls like the Harris poll issue this disclaimer at the end:

In theory, with probability samples of this size, one could say with 95 percent certainty that the overall results have a sampling error of plus or minus 3 percentage points of what they would be if the entire U.S. adult population had been polled with complete accuracy. Sampling error for subsamples is higher and varies. Unfortunately, there are several other possible sources of error in all polls or surveys that are probably more serious than theoretical calculations of sampling error. They include refusals to be interviewed (nonresponse), question wording and question order, and weighting. It is impossible to quantify the errors that may result from these factors.

For all these reasons, one should take the quoted margins of error, which are based purely on sample size, with a considerable amount of salt.

There is one last point I want to make concerning a popular misconception propagated by news reporters during elections. If an opinion poll says that a sample of 1000 voters has candidate A with 51% support and candidate B with 49%, then since the margin of error (3%) is greater than the percentage of votes separating the candidates (2%), the reporters will often say that the race is a “statistical dead heat,” implying that the two candidates have equal chances of winning.

Actually, this is not true. What those numbers imply (using math that I won’t give here) is that there is about a 75% chance that candidate A truly does lead candidate B, while candidate B has only a 25% chance of being ahead. So when one candidate is three times as likely as the other to win, it is highly misleading to say that the race is a “dead heat.”
POST SCRIPT: Film: THE RISE OF THE POLITICS OF FEAR

The Cleveland Institute of Art Cinematheque is hosting a special free screening of the documentary film THE RISE OF THE POLITICS OF FEAR on Monday, March 6, 2006 (i.e., today) at 7:00pm. This documentary by Britain’s Adam Curtis is a three-part series shown on the BBC as part of their series on THE POWER OF NIGHTMARES and was broadcast in 2004. The program is 180 minutes long.

Admission is free but an \$8 donation (\$5 members) is requested. For directions and free parking information, see here.

An article in the Guardian titled The Making of the Terror Myth reviews the documentary, and says in part:

Terrorism, by definition, depends on an element of bluff. Yet ever since terrorists in the modern sense of the term (the word terrorism was actually coined to describe the strategy of a government, the authoritarian French revolutionary regime of the 1790s) began to assassinate politicians and then members of the public during the 19th century, states have habitually overreacted. Adam Roberts, professor of international relations at Oxford, says that governments often believe struggles with terrorists “to be of absolute cosmic significance”, and that therefore “anything goes” when it comes to winning. The historian Linda Colley adds: “States and their rulers expect to monopolise violence, and that is why they react so virulently to terrorism.”

Here is information from the Cinematheque website.

Here’s the most incendiary political documentary since Michael Moore’s Fahrenheit 9/11! Adam Curtis’ three-part essay, made for the BBC, dissects the war on terror by arguing that fear has come to dominate politics, and that the notion of a secret, organized, international terror network (e.g., Al Qaeda) is a bogeyman created by powerful interests to maintain control. Curtis, whom Entertainment Weekly has called “the most exciting documentary filmmaker of our time,” employs extensive scholarship, interviews, and revealing film clips to trace the parallel rise of Islamic fundamentalism and American neoconservatism – mirror images of each other in Mr. Curtis’ view. “A superbly eye-opening and often absurdly funny deconstruction of the myths and realities of global terrorism.” –Variety.