Keep on assessing science

Ugh. I got up at 5am and tried to read a statistics paper to put myself back to sleep, and it didn’t work. Dang numbers, stop being interesting! Anyway, this paper was a meta-meta-analysis to try and dig up the causes of bias that might be causal to the reproducibility crisis in the scientific literature. Here’s the abstract from the Fanelli, Costas, and Ioannidis (2017) paper; my emphasis on some of the key points.

Numerous biases are believed to affect the scientific literature, but their actual prevalence across disciplines is unknown. To gain a comprehensive picture of the potential imprint of bias in science, we probed for the most commonly postulated bias-related patterns and risk factors, in a large random sample of meta-analyses taken from all disciplines. The magnitude of these biases varied widely across fields and was overall relatively small. However, we consistently observed a significant risk of small, early, and highly cited studies to overestimate effects and of studies not published in peer-reviewed journals to underestimate them. We also found at least partial confirmation of previous evidence suggesting that US studies and early studies might report more extreme effects, although these effects were smaller and more heterogeneously distributed across meta-analyses and disciplines. Authors publishing at high rates and receiving many citations were, overall, not at greater risk of bias. However, effect sizes were likely to be overestimated by early-career researchers, those working in small or long-distance collaborations, and those responsible for scientific misconduct, supporting hypotheses that connect bias to situational factors, lack of mutual control, and individual integrity. Some of these patterns and risk factors might have modestly increased in intensity over time, particularly in the social sciences. Our findings suggest that, besides one being routinely cautious that published small, highly-cited, and earlier studies may yield inflated results, the feasibility and costs of interventions to attenuate biases in the literature might need to be discussed on a discipline-specific and topic-specific basis.

So, in part, the reproducibility problem is cause by new researchers scrambling to get a flashy result that will get them some attention, it’s worsened if they’re working in isolation rather than as part of a team, and there are a few scientists who are ethically compromised who have been spoiling the whole barrel of apples. That all makes sense to me.

It’s hard to police against individuals with little scientific integrity — rascals are present in every field. Catching them after the fact doesn’t necessarily help, because they’ve already tainted the literature with a flash-in-the-pan compromised paper.

Scientists who had one or more papers retracted were significantly more likely to report overestimated effect sizes, albeit solely in the case of first authors. This result, consistently observed across most robustness tests, offers partial support to the individual integrity hypothesis.

Catching a scientist who publishes bad data is already severely punished, so I don’t think that one is an avenue for improving the reliability of papers. It shouldn’t be ignored, obviously, but the other observations might lead to more improvement.

The mutual control hypothesis was supported overall, suggesting a negative association of bias with team size and a positive one with country-to-author ratio. Geographic distance exhibited a negative association, against predictions, but this result was not observed in any robustness test, unlike the other two.

Collaboration is good. In the days when I was in a large lab, it was always a little suspicious when someone suddenly plopped a whole, completed paper down in the lab meeting and announced that they’d finished the experiment, and by the way, would you like to be an author on the paper? I always turned those offers down, because a co-authorship ought to be the product of ongoing involvement in the work, not some attempt at fishing for external approval. But more cooperation and vetting of each other’s work ought to be a general hallmark of good science.

I’m not in a big research lab anymore, but I still try to get that across in student labs. There’s always someone who objects to having to work with those other students and wants to do their lab projects all by themselves, and I have to turn them down and tell them they have to work in teams. They probably think it’s so I’ll have fewer lab reports to grade (OK, maybe that’s part of it…), but it’s mainly because teamwork is an essential part of the toolkit of science.

And now I’m getting confirmation that it also helps reduce spurious results.

The biggest effect, though, is associated with small study size.

Our study asked the following question: “If we draw at random from the literature a scientific topic that has been summarized by a meta-analysis, how likely are we to encounter the bias patterns and postulated risk factors most commonly discussed, and how strong are their effects likely to be?” Our results consistently suggest that small-study effects, gray literature bias, and citation bias might be the most common and influential issues. Small-study effects, in particular, had by far the largest magnitude, suggesting that these are the most important source of bias in meta-analysis, which may be the consequence either of selective reporting of results or of genuine differences in study design between small and large studies. Furthermore, we found consistent support for common speculations that, independent of small-study effects, bias is more likely among early-career researchers, those working in small or long-distance collaborations, and those that might be involved with scientific misconduct.

More data! This is also helpful information for my undergraduate labs, since I’m currently in the process of cracking the whip over my genetics students and telling them to count more flies. Only a thousand? Count more. MORE!

The paper does end on a positive note. They’ve identified some potential sources of bias, but overall, science is in fairly good shape.

In conclusion, our analysis offered a “bird’s-eye view” of bias in science. It is likely that more complex, fine-grained analyses targeted to specific research fields will be able to detect stronger signals of bias and its causes. However, such results would be hard to generalize and compare across disciplines, which was the main objective of this study. Our results should reassure scientists that the scientific enterprise is not in jeopardy, that our understanding of bias in science is improving and that efforts to improve scientific reliability are addressing the right priorities. However, our results also suggest that feasibility and costs of interventions to attenuate distortions in the literature might need to be discussed on a discipline- and topic-specific basis and adapted to the specific conditions of individual fields. Besides a general recommendation to interpret with caution results of small, highly cited, and early studies, there may be no one-size fits-all solution that can rid science efficiently of even the most common forms of bias.

Fanelli D, Costas R, Ioannidis JPA (2017) Meta-assessment of bias in science. Proc.Nat.Acad.Sci USA doi: 10.1073/pnas.1618569114.


  1. wcorvi says

    PZ, how about this as a learning tool. You have your students do a statistical study on a small sample, say 100 flies, and write conclusions. Then you have them analyze the entire sample combined from all the students, and see if their conclusions hold up.

  2. says

    We already do that. Individual groups gather data and do their own evaluations, and then we consolidate all the data from all of the groups for a calculation.

    Also, the very first lab is about basic probability & stats, and I have the individual students flip a coin 10 times and see that it comes up 5 heads and 5 tails occasionally…and then we pool all the data and see a nice gaussian distribution emerge.

  3. OptimalCynic says

    It’s all about incentives. Fix the incentives to publish this dross and the problem will lessen. Of course, that’s easier to say than to do!

  4. Dr. Pablito says

    I am ever mindful of something a mentor once impressed on me as a youngster:
    “You think that everything in the published literature is right?!”

  5. Ichthyic says

    oh so easy to say. NOT so easy to do.

    the people who think “Just count MORE FLIES”, are NOT field biologists.

    I cannot count the number of times I have participated in this argument in academia, to try and educate lab wonks just how different field work is, and that you just CAN’T always just “count more flies”.

    this is why so many of us use nonparametric statistical models so often.

    and we entirely disagree that using them, or the sample sizes were are forced to deal with, makes the results of our studies any less valid or interesting.

    as far as repetition goes? all for it. just send money… that hasn’t been there for decades.