# The chocolate diet, or how to lie with science

I have to say that I was positively thrilled by this article on how you can lose weight by eating chocolate. It encapsulates so many things I try to drill into my students — I’ll probably use it in my genetics class as an example of bad statistics, and my writing class as an example of using science writing skills for evil.

Here’s the deal: chocolate doesn’t help you lose weight. But if you confuse the data with a large number of variables that you ignore, and do a little unscrupulous p-hacking, you can get an effect with statistical significance. So these authors set out to produce a bad study in nutritional science, and see if they can get it to be publicized.

The first step was to have vague criteria and multiple effects that you’re measuring.

Here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives.

Think of the measurements as lottery tickets. Each one has a small chance of paying off in the form of a “significant” result that we can spin a story around and sell to the media. The more tickets you buy, the more likely you are to win. We didn’t know exactly what would pan out—the headline could have been that chocolate improves sleep or lowers blood pressure—but we knew our chances of getting at least one “statistically significant” result were pretty good.

Whenever you hear that phrase, it means that some result has a small p value. The letter p seems to have totemic power, but it’s just a way to gauge the signal-to-noise ratio in the data. The conventional cutoff for being “significant” is 0.05, which means that there is just a 5 percent chance that your result is a random fluctuation. The more lottery tickets, the better your chances of getting a false positive.

I try very hard to get my genetics students to understand the limitations of p values, and the importance of good experimental design. “Statistically significant” is not a magic mantra that means something is true. It’s actually, I think, a good idea to analyze your data thoroughly, looking for promising odd effects, but experiments must be designed to focus on a specific effect. If your experiment was designed to test an effect of A on B, and then later in your analysis you find a suggestion that A has an effect on C, great…but you should treat that as a hint that you should design a study that specifically tests for the effect of A on C, not that you’re done and can publish a paper on C.

Another important thing to consider is sample size. There are ways of estimating the necessary sample size to detect an effect of a certain magnitude — it’s called statistical power analysis. Or, you can just ignore those criteria and go with the sample you have. This study had an n of 15, and they made no effort to select matched subjects. That’s a simple, brilliant way to maximize the impact of random variation.

But even if we had been careful to avoid p-hacking, our study was doomed by the tiny number of subjects, which amplifies the effects of uncontrolled factors. Just to take one example: A woman’s weight can fluctuate as much as 5 pounds over the course of her menstrual cycle, far greater than the weight difference between our chocolate and low-carb groups. Which is why you need to use a large number of people, and balance age and gender across treatment groups. (We didn’t bother.)

So they accumulated some crappy, worthless, noisy data, fished through it for an effect that had statistical significance, and produced a bad paper that no competent reviewer would have accepted. So they had to find a publisher who wouldn’t bother with that peer review stuff.

It was time to share our scientific breakthrough with the world. We needed to get our study published pronto, but since it was such bad science, we needed to skip peer review altogether. Conveniently, there are lists of fake journal publishers. (This is my list, and here’s another.) Since time was tight, I simultaneously submitted our paper—“Chocolate with high cocoa content as a weight-loss accelerator”—to 20 journals. Then we crossed our fingers and waited.

Our paper was accepted for publication by multiple journals within 24 hours. Needless to say, we faced no peer review at all. The eager suitor we ultimately chose was the the International Archives of Medicine. It used to be run by the giant publisher BioMedCentral, but recently changed hands. The new publisher’s CEO, Carlos Vasquez, emailed Johannes to let him know that we had produced an “outstanding manuscript,” and that for just 600 Euros it “could be accepted directly in our premier journal.”

Accepted for publication within 24 hours? I once had a pair of papers slowly trudging through review for a year before they got published. “Journals” that just want money to dispense with the indispensible part of scientific review are a blight.

Papers that get published in those schlock journals tend to get ignored by other scientists, and gradually wither into negligibility. That paper was doomed from a scientific perspective from the very start. But there’s another option: mass media! They don’t care how good a paper is, just how sensational the result is. So they made a big PR push, sending press releases to media sources that happily swallowed the story whole.

We landed big fish before we even knew they were biting. Bild rushed their story out—”Those who eat chocolate stay slim!”—without contacting me at all. Soon we were in the Daily Star, the Irish Examiner, Cosmopolitan’s German website, the Times of India, both the German and Indian site of the Huffington Post, and even television news in Texas and an Australian morning talk show.

This is dangerous knowlege to give to students, unfortunately. Yeah, I can wag my finger and tell them that these are things you must not do, but the risk is that what they’ll see is that bad science works, at least as far as getting you public attention.

1. I read that this morning with great amusement.

So they accumulated some crappy, worthless, noisy data, fished through it for an effect that had statistical significance, and produced a bad paper that no competent reviewer would have accepted.

Are you sure? It took the Lancet years to retract the MMR vaccine causes autism crap and it’s still haunting us today.
And let’S not forget the reviewer who demanded that menz write the papers ‘Cause they’re unbiased…

2. carlie says

Another thing this “study” in particular had going for it is that it is something people desperately want to be true. Lose weight by eating chocolate? The test results could have come out completely the opposite and the story still would have spread, because people want to have their cake and eat it too (heh).

3. wcorvi says

I suspect it could have been accepted by a ‘genuine’ refereed journal, too. A writer-friend of mine called Astrophysical Journal a vanity press because it CHARGED authors to publish, rather than paying them.

Papers that take a good part of a year to go through the referee process sometimes have major flaws pointed out within hours of appearing in the journal.

4. says

Point accepted. There is some terrible drivel that appears in Science and Nature, too.

5. says

I looked at the list of journals that accepted it, and one of them is an Elsevier journal.

WCORVI — academic journals never pay authors. And there are perfectly legitimate open access journals that do charge publication fees. Open access publishing is a model with a lot in its favor, but the problem is it does attract the fraudsters.

6. leerudolph says

There is some terrible drivel that appears in Science and Nature, too.

I have always liked the sound of this line from an old flakey fraud (who Google has just identified to me as C.D. Broad, “Tarner Lecturer and as Lecturer in the Moral Sciences at Trinity College”, writing in 1925), but of course I couldn’t use it for the obvious reason:

The scientists in question seem to me to confuse the Author of Nature with the Editor of Nature ; or at any rate to suppose that there can be no productions of the former which would not be accepted for publication by the latter.

But thanks to a previous comment on this post, I have seen the light! The noun phrase “the Author of Nature” does not have any reference in the sense Broad meant; but, in fact, Nature is its own Author, and its own publisher as well. That’s right: the universe is just one great big vanity press. (And a good thing, too.)

7. Rob Grigjanis says

leerudolph @6:

the universe is just one great big vanity press

I thought it was a great big saloon bar, with nice decor but questionable clientele. Could be both, I suppose.

8. laurentweppe says

I try very hard to get my genetics students to understand the limitations of p values, and the importance of good experimental design. “Statistically significant” is not a magic mantra that means something is true.

Funny that: one of the very few things I remember for my stats courses was my teachers obsessively repeating “Nothing is statistically significant unless built from a sufficiently large sample“. They’d talk about how fucking important assessing any and every sample’s size is and how they’d ruthlessly sack any student who’d forget mentioning it during oral exams three times per course on average.

9. anbheal says

@2 Carlie — but frequent masturb…I mean frequent sex really DOES reduce the chances of prostate cancer…..doesn’t it???

10. slithey tove (twas brillig (stevem)) says

Twain (IIRC) said, “3 forms of lying: 1)presenting false as true, 2 presenting true as false, 3) statistics. boom.” I totally paraphrased that ?quote?, but I know I got the “statistics” part of it exact. think I’m lying? statistics. ^_^

This paper, discussed in the OP, proves that it is possible to lie using sloppy interps of the statistics you collected honestly during you “experiment”. This fraud is revealed as such for the purpose of highlighting the importance of statistics (properly analyzed) in EVERY science experiment. And when reading papers it is important to examine the statistical arguments presented there; don’t just read the conclusion and blindly accept it as “proven, case closed”.

switching over to chocolate discussion: I was always suspicious of any claims, such as “weight loss through chocolate…” e.g. Hershey®’s glop has more milk and sugar than choco in its stuff, ALL “milk chocolate” is likewise guilty. That is why I have converted to “Dark Chocolate”, only. Chocolate flavor is enhanced in Dark Chocolate, much better than “Milk Chocolate” (lookin at you M&M’s also).

11. David Marjanović says

Are you sure? It took the Lancet years to retract the MMR vaccine causes autism crap and it’s still haunting us today.

Not necessarily the same thing. Reviewers can usually tell when your sample size is too small. We can’t usually tell if your sample size is made up.

Papers that take a good part of a year to go through the referee process sometimes have major flaws pointed out within hours of appearing in the journal.

There’s no connection there. Some journals – quite prestigious ones in their fields – simply have such a backlog that every manuscript that isn’t rejected takes a year or longer.

Chocolate flavor is enhanced in Dark Chocolate

Not necessarily. Milkless chocolates start at 40 % cocoa; I know delicious milk chocolates with 40 and even 51 % cocoa.

12. Dean Pentcheff says

This is dangerous knowlege to give to students, unfortunately. Yeah, I can wag my finger and tell them that these are things you must not do, but the risk is that what they’ll see is that bad science works, at least as far as getting you public attention.

But on the other hand, you can use it to teach the power of promoting good science. A key takeaway from that story is that the authors put in a pretty minimal amount of work to promote the paper, and it worked. In spades.

Most scientists seem to think that the paper itself will “sell itself”. At best, they may help a university PR person put together a one-page press release with turgid text and no images. It’s no surprise that good science then gets passed over by the press and social media.

It doesn’t take a whole lot of effort to do a press release that tells a story about the good science and simultaneously supplies good photo or video assets that can be use in publications. Then get that press release directly into good journalists’ hands. Hey presto — make the journalists’ work easier and you get picked up! The difference is that it’s actually good science that’s being pushed, and good science journalists see that and pass it on.

13. Dean Pentcheff says

(Sorry, didn’t quote the first paragraph in my last post — that’s PZ, not me!)

14. Fair Witness says

I hope they accept the invitation ( which I am sure they will get) to be on Dr. Oz, and they take the opportunity to explain how it is BAD science.

15. Golgafrinchan Captain says

That’s an awesome article. It’s going into my list of things that I use to demonstrate the pitfalls of science ignorance. I wonder if they tried contacting Dr. Oz. It’s funny (i.e. infuriating) that this “designed to fail” study had a 25% larger sample size that Wakefield’s autism study.

My favourite is still Facts About Dihydrogen Monoxide @ http://www.dhmo.org/facts.html. I send people to that page and add my support that by saying “everything on that site is true.” When they are sufficiently horrified by what they read, I tell them the one critical tidbit of information that’s missing from the site and ask them to read it again with their new knowledge. The wikipedia page on dihydrogen monoxide is also a good read.

16. Golgafrinchan Captain says

This was also disappointing but unsurprising: “No one dipped into our buffet of chocolate music videos. Instead, they used vaguely pornographic images of women eating chocolate.” Of course they did.

17. says

It’s not the first time I’ve heard phony baloney stuff like this. About 15-20 years ago, the “whiskey diet” was making the rounds, using the same selective and faulty math to “prove” that it works.

The difference is, the whiskey diet was written as a joke. Nobody took it seriously, not even incompetent phony baloney “journalists”.

18. Pierce R. Butler says

Would they have had any difference in their acceptance rates if they had praised the effects of Sokalate?

19. says

switching over to chocolate discussion: I was always suspicious of any claims, such as “weight loss through chocolate…” e.g. Hershey®’s glop has more milk and sugar than choco in its stuff, ALL “milk chocolate” is likewise guilty. That is why I have converted to “Dark Chocolate”, only. Chocolate flavor is enhanced in Dark Chocolate, much better than “Milk Chocolate” (lookin at you M&M’s also).

To each their own. I prefer milk chocolate over dark chocolate. But I prefer white chocolate to milk chocolate. The less cocoa solids the better, in my opinion (unless I’m really in the mood for it, but 7 out of 10

20. slithey tove (twas brillig (stevem)) says

David M wrote:

Not necessarily. Milkless chocolates start at 40 % cocoa; I know delicious milk chocolates with 40 and even 51 % cocoa.

hmmmm, to clarify my point, the dark chocolates I prefer are 60-70% Cocoa. To each his own…

21. marcus says

Jeff Lewis @ 19 ” …I prefer milk chocolate over dark chocolate. But I prefer white chocolate to milk chocolate. “
Wait… what? Why, i don’t even… ?????
Heresy!!!!

22. Golgafrinchan Captain says

@ marcus #21

It’s spelled ‘Hershey’ ;)

23. Al Dente says

Weeps bitterly into glass of Mountain Dew.

24. marcus says

Golgafrinchan Captain @ 22
Hershey- Heresy… Same thing.
Since I’ve learned about their latest shenanigans*, they are dead to me.

*Pushing chocolate flavored products as opposed to actual chocolate.

Does that make it a Sokal-ette hoax?

26. inquisitiveraven says

Actually, the line is “lies, damn lies, and statistics,” and no one knows where it comes from. It’s most frequently attributed to Mark Twain and Benjamin Disraeli. I can assure you however, that it did not originate with Twain. That’s because I’ve seen the essay he where he says that, and a) it’s a quote, and b) Twain attributes it to Disraeli, although he’s honest enough to admit he doesn’t actually know the source and points to Disraeli as the most commonly cited source he’s aware of.

27. Tualha says

Hmm, so you cast a wide net with no particular target in mind, take whatever happens to fall into it, and claim the result is significant. Now where have I heard that before?

28. alkisvonidas says

“Statistically significant” is not a magic mantra that means something is true.

One more issue that is frequently forgotten in studies in biology, chemistry and medicine (well, at least more frequently than in, say, physics) is that statistical correlation is only a method of sniffing out a causal relation. Meaning, alright, suppose there IS a genuine correlation here, you still have no idea of the underlying causal pathway. Until you do, you really haven’t discovered anything.

Another way of putting it is, if you cannot utilize the supposed connection to reliably reproduce the “effect”, you don’t really understand it. Can you actually achieve a weight loss by prescribing a chocolate diet? You know the answer as well as I do.

29. David Marjanović says

Hershey’s has almost managed to produce disgusting chocolate. I didn’t know that was possible.

White chocolate sort of works as a vehicle for vanilla, but otherwise it’s completely useless. o.O

this “designed to fail” study had a 25% larger sample size that Wakefield’s autism study

…Oh.

That throws a really bad light on The Lancet, and on my comment 11.

That’s the one about a higher-order problem, the failure to correct for multiple testing of the same hypothesis.

30. AlexanderZ says

Looking at the bad publishers’ global map I can see that many of them are located in Nigeria, and that Nigeria is the only country both a high number of bad publishers and not a single “good” publisher. I guess that Nigerian prince from my emails has found a new job. Good on him.

P.S.
I know of a course in one university that (among other things) teaches how NOT to write a paper and its examples are all from Science magazine. Needless to say they an abundance of examples.

31. howardhershey says

I have also seen a problem with too large a sample size. It increases the likelihood of observer bias and other very subtle biases to increase the likelihood of a “significant” correlation. For example, in student labs where students had to analyze whether or not a particular gene (dpy) meets the expected Mendelian 1:3 ratio of a recessive autosomal gene, the test always shows very significant deviance from that ratio (probably because of observer bias that fails to ‘see’ the trait when it is mild but also possibly because of subtle biases in death). Yet this trait also always passes a simultaneous test to see if there is any interaction between that trait and sex-linked traits that are easier to observe.

32. David Marjanović says

That’s the one about a higher-order problem, the failure to correct for multiple testing of the same hypothesis.

Oops. Now that I’ve finally read the article, that’s actually a big part of it. Relevant quote:

“Here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a ‘statistically significant’ result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives.

Think of the measurements as lottery tickets. Each one has a small chance of paying off in the form of a ‘significant’ result that we can spin a story around and sell to the media. The more tickets you buy, the more likely you are to win. We didn’t know exactly what would pan out—the headline could have been that chocolate improves sleep or lowers blood pressure—but we knew our chances of getting at least one ‘statistically significant’ result were pretty good.”

Okay everybody, it was a prank.

Take a look at the link at the very top of the OP.

33. David Marjanović says

More:

“The more lottery tickets, the better your chances of getting a false positive. So how many tickets do you need to buy?

P(winning) = 1 – (1 – p)^n

With our 18 measurements, we had a 60% chance of getting some ‘significant’ result with p < 0.05. (The measurements weren’t independent, so it could be even higher.) The game was stacked in our favor."

34. rietpluim says

@David Marjanović #35 – Now I feel stupid… I never cared to check that link, because I assumed it was pointing to one of those newspaper articles that fell for it. Thanks for pointing it out.