My post on physics researchers searching for the Higgs particle needing to get the chance of statistical errors down to below the five-sigma level (or 0.000028%) generated some discussion on the problems that can arise (mainly the increased likelihood of false positive results) in other areas such as the social sciences where the threshold for acceptability is often as high as 5%.
Ed Yong writes that this is becoming a serious problem in the field of psychology where there seem to be a lot of false positive results coupled with a reluctance by journals to publish articles that contradict positive results that they published earlier. As a result there is a worrying number of published results that are not reproducible but the original studies remain officially unrefuted. Yong published an article on this recently in Nature (vol. 485, 298–300, 17 May 2012).
Yong says that the problem is that journals love surprising and counter-intuitive results and these are more likely to occur in the field of psychology where almost everyone has some intuition about what should happen about practically anything. This is unlike the case in (say) physics where people are unlikely to have a gut feeling about how Higgs bosons or neutrinos behave. As Yong says:
Psychology is not alone in facing these problems. In a now-famous paper, John Ioannidis, an epidemiologist currently at Stanford School of Medicine in California argued that “most published research findings are false”, according to statistical logic. In a survey of 4,600 studies from across the sciences, Daniele Fanelli, a social scientist at the University of Edinburgh, UK, found that the proportion of positive results rose by more than 22% between 1990 and 2007. Psychology and psychiatry, according to other work by Fanelli, are the worst offenders: they are five times more likely to report a positive result than are the space sciences, which are at the other end of the spectrum (see ‘Accentuate the positive’). The situation is not improving. In 1959, statistician Theodore Sterling found that 97% of the studies in four major psychology journals had reported statistically significant positive results. When he repeated the analysis in 1995, nothing had changed.
One reason for the excess in positive results for psychology is an emphasis on “slightly freak-show-ish” results, says Chris Chambers, an experimental psychologist at Cardiff University, UK. “High-impact journals often regard psychology as a sort of parlour-trick area,” he says. Results need to be exciting, eye-catching, even implausible. Simmons says that the blame lies partly in the review process. “When we review papers, we’re often making authors prove that their findings are novel or interesting,” he says. “We’re not often making them prove that their findings are true.”
Siri Carpenter reports (Science vol. 335 no. 6076 pp. 1558-1561, 30 March 2012) on an initiative led by Brian Nosek involving about 50 academic psychologists who have started what they call an Open Science Collaboration where they seek to systematically replicate important results. Needless to say this is causing some trepidation in the community since if a lot of major, highly publicized results are refuted, they fear that the field may be tarnished.
But that concern must surely take a back seat to finding the truth. It is never a good thing when false ideas in science are allowed to propagate. I think the OSC effort will do psychology a world of good in the long run.
Another interesting development arises out of a case of seeming misconduct by a researcher in psychology. As Ed Yong reports, this misconduct was not discovered because of whistleblowers but by a researcher who developed a statistical tool to see if published data sets were too good to be true and hence implausible. “His test looks for an overabundance of positive results given the nature of the experiments – a sign that researchers have deliberately omitted negative results that didn’t support their conclusion, or massaged their data in a way that produces positive results.” This tool reminds me of Benford’s law.
This new method is important for two reasons. One is because the practice of massaging data, either consciously or unconsciously, may be more common than we would like to be the case and we need some way of making researchers more vigilant about the danger. The other is that it enables people to check on the plausibility of results without having to repeat the entire experiment.
It is tricky to navigate the world of knowledge. We humans tend to accept uncritically those results that seem to confirm what we already believe. Yet academic publishing tends to reward those things that overturn conventional wisdom. If allowed to remain uncontradicted, they can become the new conventional wisdom.
The only gut feeling we should trust is skepticism. It is good to treat any study, and especially one that gives surprising results, as tentative and wait for corroboration before giving too much credence to it.