We need somewhere to bury sloppy research on fast food, after all. Brian Wansink gets interviewed on Retraction Watch (y’all remember Wansink, the fellow who ground his data exceedingly fine to extract four papers from a null result), and he does himself no favors.
Well, we weren’t testing a registered hypothesis, so there’d be no way for us to try to massage the data to meet it. From what I understand, that’s one definition of p-hacking. Originally, we were testing a hypothesis – we thought the more expensive the pizza, the more you’d eat. And that was a null result.
But we set up this two-month study so that we could look at a whole bunch of totally unanswered empirical questions that we thought would be interesting for people who like to eat in restaurants. For example, if you’re eating a meal, what part influences how much like the meal? The first part, the middle part, or the last part? We had no prior hypothesis to think anything would predominate. We didn’t know anybody who had looked at this in a restaurant, so it was a totally empirical question. We asked people to rate the first, middle, and last piece of pizza – for those who ate 3 or more pieces – and asked them to rate and the quality of the entire meal. We plotted out the data to find out which piece was most linked to the rating of the overall meal, and saw ‘Oh, it looks like this happens.’ It was total empiricism. This is why we state the purpose of these papers is ‘to explore the answer to x.’ It’s not like testing Prospect Theory or a cognitive dissonance hypothesis. There’s no theoretical precedent, like the Journal of Pizza Quality Research. Not yet.
That last bit sounds like a threat.
Here’s the thing: we all do what he describes. An experiment failed (yes, it’s happened to me a lot). OK, let’s look at the data we’ve got very carefully and see if there’s anything potentially interesting in it, any ideas that might be extractable. The results are a set of observations, after all, and we should use them to try and figure out what’s going on, and in a perfect world, there’d be public place to store negative results so they aren’t just buried in a file drawer somewhere. There’s nothing wrong with analyzing your data out the wazoo.
The problem is that he then published it all under the guise of papers testing different hypotheses. Most of us don’t do that at all. We see a hint of something interesting buried in the data for a null result, and we say, “Hmm, let’s do an experiment to test this hypothesis”, or “Maybe I should include this suggestive bit of information in a grant proposal to test this hypothesis.” Just churning out low-quality papers to plump up the CV is why I said this is a systemic problem in science — we reward volume rather than quality. It doesn’t make scientists particularly happy to be drowning in drivel, but Elsevier is probably drooling at the idea of a Journal of Pizza Quality Research — another crap specialized journal that earns them an unwarranted amount of money and provides another dumping ground for said drivel being spewed out.
Wansink seems to be dimly aware of this situation.
These sorts of studies are either first steps, or sometimes they’re real-world demonstrations of existing lab findings. They aren’t intended to be the first and last word about a social science issue. Social science isn’t definitive like chemistry. Like Jim Morrison said, “People are strange.” In a good way.
Yes. First steps. Maybe you shouldn’t publish first steps. Maybe you should hold off until you’re a little more certain you’re on solid ground.
No one expects social science to be just like chemistry, but this idea that you don’t need robust observations with solid methodology might be one reason there is a replicability crisis. Rather than repeating and engaging in some healthy self-criticism of your results, you’re haring off to publish the first thing that breaches an arbitrary p-value criterion.
There really are significant problems with the data he did publish, too. Take a look at this criticism of one of his papers. The numbers don’t add up. The stats don’t make sense. His tables don’t even seem to be appropriately labeled. You could not replicate the experiment from the report he published. This stuff is incredibly sloppy, and he doesn’t address their failings in the interview, except inadequately and in ways that don’t solve the problems with the work.
Again, I’m trying to be generous in interpreting the purpose of this research — often, interdisciplinary criticism can completely miss the point of the work (see also how physicists sometimes fail to comprehend biology, and inappropriately apply expectations from one field to another) — but I’m also seeing a lack of explanation of the context and relevance of the work. I mean, when he says, “For example, if you’re eating a meal, what part influences how much like the meal? The first part, the middle part, or the last part?”, I’m just wondering why. Why would it matter, what are all the variables here (not just the food, but in the consumer), and what do you learn from the fact that Subject X liked dessert, but not the appetizer?
It sounds like something a restaraunteur or a food chain might want to know, or that might might appeal to an audience at a daytime talk show, but otherwise, I’m not seeing the goal…or how their methods can possibly sort out the multitude of variables that have to be present in this research.




