I have to say that I was positively thrilled by this article on how you can lose weight by eating chocolate. It encapsulates so many things I try to drill into my students — I’ll probably use it in my genetics class as an example of bad statistics, and my writing class as an example of using science writing skills for evil.
Here’s the deal: chocolate doesn’t help you lose weight. But if you confuse the data with a large number of variables that you ignore, and do a little unscrupulous p-hacking, you can get an effect with statistical significance. So these authors set out to produce a bad study in nutritional science, and see if they can get it to be publicized.
The first step was to have vague criteria and multiple effects that you’re measuring.
Here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives.
Think of the measurements as lottery tickets. Each one has a small chance of paying off in the form of a “significant” result that we can spin a story around and sell to the media. The more tickets you buy, the more likely you are to win. We didn’t know exactly what would pan out—the headline could have been that chocolate improves sleep or lowers blood pressure—but we knew our chances of getting at least one “statistically significant” result were pretty good.
Whenever you hear that phrase, it means that some result has a small p value. The letter p seems to have totemic power, but it’s just a way to gauge the signal-to-noise ratio in the data. The conventional cutoff for being “significant” is 0.05, which means that there is just a 5 percent chance that your result is a random fluctuation. The more lottery tickets, the better your chances of getting a false positive.
I try very hard to get my genetics students to understand the limitations of p values, and the importance of good experimental design. “Statistically significant” is not a magic mantra that means something is true. It’s actually, I think, a good idea to analyze your data thoroughly, looking for promising odd effects, but experiments must be designed to focus on a specific effect. If your experiment was designed to test an effect of A on B, and then later in your analysis you find a suggestion that A has an effect on C, great…but you should treat that as a hint that you should design a study that specifically tests for the effect of A on C, not that you’re done and can publish a paper on C.
Another important thing to consider is sample size. There are ways of estimating the necessary sample size to detect an effect of a certain magnitude — it’s called statistical power analysis. Or, you can just ignore those criteria and go with the sample you have. This study had an n of 15, and they made no effort to select matched subjects. That’s a simple, brilliant way to maximize the impact of random variation.
But even if we had been careful to avoid p-hacking, our study was doomed by the tiny number of subjects, which amplifies the effects of uncontrolled factors. Just to take one example: A woman’s weight can fluctuate as much as 5 pounds over the course of her menstrual cycle, far greater than the weight difference between our chocolate and low-carb groups. Which is why you need to use a large number of people, and balance age and gender across treatment groups. (We didn’t bother.)
So they accumulated some crappy, worthless, noisy data, fished through it for an effect that had statistical significance, and produced a bad paper that no competent reviewer would have accepted. So they had to find a publisher who wouldn’t bother with that peer review stuff.
It was time to share our scientific breakthrough with the world. We needed to get our study published pronto, but since it was such bad science, we needed to skip peer review altogether. Conveniently, there are lists of fake journal publishers. (This is my list, and here’s another.) Since time was tight, I simultaneously submitted our paper—“Chocolate with high cocoa content as a weight-loss accelerator”—to 20 journals. Then we crossed our fingers and waited.
Our paper was accepted for publication by multiple journals within 24 hours. Needless to say, we faced no peer review at all. The eager suitor we ultimately chose was the the International Archives of Medicine. It used to be run by the giant publisher BioMedCentral, but recently changed hands. The new publisher’s CEO, Carlos Vasquez, emailed Johannes to let him know that we had produced an “outstanding manuscript,” and that for just 600 Euros it “could be accepted directly in our premier journal.”
Accepted for publication within 24 hours? I once had a pair of papers slowly trudging through review for a year before they got published. “Journals” that just want money to dispense with the indispensible part of scientific review are a blight.
Papers that get published in those schlock journals tend to get ignored by other scientists, and gradually wither into negligibility. That paper was doomed from a scientific perspective from the very start. But there’s another option: mass media! They don’t care how good a paper is, just how sensational the result is. So they made a big PR push, sending press releases to media sources that happily swallowed the story whole.
We landed big fish before we even knew they were biting. Bild rushed their story out—”Those who eat chocolate stay slim!”—without contacting me at all. Soon we were in the Daily Star, the Irish Examiner, Cosmopolitan’s German website, the Times of India, both the German and Indian site of the Huffington Post, and even television news in Texas and an Australian morning talk show.
This is dangerous knowlege to give to students, unfortunately. Yeah, I can wag my finger and tell them that these are things you must not do, but the risk is that what they’ll see is that bad science works, at least as far as getting you public attention.