Why do scientists cheat?

I am dismayed at this emerging story about fraud in science. It stars Jonathan Pruitt, a professor at McMaster University who studies variation in individual behavior and how it affects group behavior. I’d heard of him since he’s doing a lot of work in social spiders.

He built up several productive collaborations, in particular with Kate Laskowski at UC Davis, sharing data with her that she used in several publications. That’s where the story turns dark, because Laskowski later examined the data in more detail and found multiple examples of blocks of data having been duplicated, padding the data set with more replicates than were actually done. He’d actually passed her a poison pill that tainted all the work they’ve done together; her papers are no longer trustworthy, and she has retracted them.

Laskowski is being heroically restrained in her reaction to this betrayal — I’d probably be throwing things and saying lots of not-nice words. Pruitt also seems to be peculiarly blasé and detached from the problem, conceding that there are serious problems in the data set, but not offering any explanations about how this has happened (again, if some of my data were found to be bogus, I’d either be furious and trying to track down the source of the bad data, or, if I were guilty of doing the duplications, I guess I’d be trying hard to deflect.)

There’s a lot of discussion and dissection of this issue going on, and most of it seems to be rightly concerned with making sure Pruitt’s coauthors aren’t hit with serious splash damage. At some point, though, there has to be a reckoning, and the source of the contamination tainting so much work will have to be dealt with. So far, everyone seems to be strangely cautious and circumspect.

I will not say Jonathan Pruitt is a victim, but he is part of the tragedy. Will we ever really know what motivated him? I decline to guess. He burst on the animal behavior scene with his first paper in 2008 and immediately began publishing at such a prolific rate that in another year or two he would have overtaken my own 41 year career in numbers of publications. This output got him a lot of academic success leading to his current position (current as I write anyway) of Canada 150 chair at McMaster University.

What Jonathan Pruitt produced was so far beyond average, it is hard to believe anyone would feel pushed to that level. But others feel pressure to produce in academia.

Fine. I’m not involved in any of this concern, so it’s not my place to say how the victims ought to respond. But I would say that the slower the build-up, the bigger the explosion, and so far this is looking to be a truly ugly meltdown at some point in the near future. Keep an eye on Jonathan Pruitt, there will be a supernova at some point soon, and not the good, pretty kind.

I’m mainly dismayed at the failure of scientific ethics. You don’t make up data! Ever! Every year I’m in student labs, explaining to students that “your data is your data” (I literally say that a lot, I’m afraid), and if your experiment didn’t come out the way you expected, or the data are ambiguous about what the one true Answer is, your job isn’t to make the data fit, it’s to rethink your work, track down sources of confusion, repeat the work, analyze the results appropriately, and if it doesn’t support your expected answer, revise your expectations.

That’s easy for me, though. The students don’t have a publication in Nature or a tenure decision in their favor, so they’re lacking all that unscientific pressure to get the neat, tidy, snazzy answer with beautiful p values.


  1. komarov says

    I can’t decide which would make me more angry had I been given data like that: the fact that all the work and time I’d invested into that data would be wasted, plain and simple, or that the feeling you get when you look at the data and begin to see something in there was utterly wrong. Nope, it wasn’t an insight, it was all wrong, bordering – and based on – on lies.

    Oh, hang on, it’d be the second one. Honest mistakes I could accept, perhaps with some muttering, but passing around cooked data? Just no.

  2. jackal says

    I’m a statistical consultant working in medical and public health research. Most statistical tests rest on the assumption that all observations are completely independent from each other. Failing to meet that assumption invalidates those tests.I just had tell a client that we couldn’t do the planned analysis because glitches in their data collection process resulted in unidentifiable duplicates. It was heartbreaking – they’re no longer going to be able publish on this study – but they understood that you don’t f*ck with bad data. I can’t imagine purposely ruining a data set by adding duplicates – and then sharing the data and publishing on it! If Pruitt knew about the duplicates, this should end his career.

  3. anthrosciguy says

    “at such a prolific rate”

    It’s easy to do a lot of work when you don’t actually do the work.

  4. says

    It has been a few years since I was forced to retire, so my memories of the amateur statsical analysis is probably a bit bunk (I used SYSTAT which had these huge manuals I read religiously). But I swear there are multiple tests that allowed you to look to see if there were problems with the data before you actually did any studies with the data but I guess most people have neither the time nor the knowledge to apply these tests. Of course I mostly remember having to take the same data and format it differently depending what kind of tests were needed being the most annoying part and then understanding what the test results meant AND EXPLAINING IT TO OTHERS being the most difficult part.

  5. garnetstar says

    Wow, a lot of scientists have been found out recently!

    ITA with all the sentiments above, but for PZ: last week was my first week of lab class too, and I also went around saying “You go with your data.” They have to report what they got, explain or hypothesize why it’s wrong, then use good data that’s supplied if they need it to write the rest of the lab report, with correct attribution.

    Even their big three-week research project later this semester: they don’t know it, but it will fail. None of the compounds that they’re making and assesing as candidates will work. It’s always a hoot to go around the lab that last day and watch their reactions as one compound after another fails.

    I tell them that they must write up their results in their big paper on this, and give possible reasons for the failure, and recommend ways to make the compounds work.

    I didn’t design it to fail on purpose, they just can’t synthesize the right compounds in the time available. So, I make them make lemonade out of it, they need to get used to it now.

  6. gorobei says

    Good for Laskowski.
    My daughter got pressured to “invent” data last week. Her results weren’t going to be ready in time for a science competition (fruit flies breed at their own pace.)
    I’m glad she said “No way, I’ll take a fail in the course if needed.”
    I guess she did the right thing.

  7. garnetstar says

    gorobei @7, yes, she certainly did, and tell her I said congrats to her, as well as to you.

    Also, if she wants to take inorganic chemistry lab, would you suggest my class? Thanks.

  8. jrkrideau says

    Kudos indeed to Kate Laskowski , both for noticing that there was a problem and for her courage in asking for retractions.

    For a relatively junior faculty member to take on such a famous more senior person is impressive.

    I am pretty sure if I had noticed the faulty data I would have felt betrayed and furious.

    It sounds like this is going to turn into a mess like the Brian Wansink/a> mess or worse.

    Why do scientists cheat? Fame, fortune, tenure, desire to keep eating? In some cases, such as the Cyril Burt case, it looks ideological. Probobly in some cases, just laziness.

    Why do bankers steal & defraud.

    BTW, for the psych types here, Burt was Hans Eysenck’s doctoral advisor and a lot of Eysenck’s work is taking a beating.

  9. jrkrideau says

    @ 5 Sad OldGuy
    A fellow SYSTAT user. I used it extensively back in the 1980’s and 1990’s.

    I swear there are multiple tests that allowed you to look to see if there were problems with the data before you actually did any studies with the data

    Oh, there are but they are designed or intended to be used examine “honest” data. You might be looking for instances of non-normality, or suspicious outlier and so on.

    The problem may be that the data you are checking has been “hand edited” to force it into normality or the experimenter forgot to mention that the 50 rats in the sample really had been 70 but that they had thrown away the data for 20 had inconveniently died or that 5 different data sets in papers from 5 different sets of authors was not independent; the first set of authors passed their data on to the second setmof authors who added their data and so on. BTW, in this last example, I do not thik anyone intended to deceive. They just were incompetent researchers.

    People have developed things like Benford’s Law and other tests to investigate the trustworthiness of data.

    In the last few years The Four Horsemen of the Retraction as I like to think of them, Tim van der Zee, Jordan Anaya, Nicholas Brown & James Heathers have been spreading terror among a number of social and behavioral miscreants.

    Meet two of the horsemen

  10. chrislawson says

    as jkrideau@10 says–

    Statistical tests for data anomalies are easily subverted by a smart fraudster. Pruitt seems to have come undone because his appetite for publishing crazy numbers of papers led him to use quick methods of data creation.

    If we want to catch smart fraudsters, we need to encourage a culture of honesty and skepticism so co-workers don’t get caught up in their schemes, and data auditing which nobody wants to pay for.

  11. John Morales says

    Q: “Why do scientists cheat?”
    A: For their own benefit.

    (Also, #notallscientists)

    (Pretty damn simple, really — and yes, I know it was rhetorical)

  12. John Morales says


    If we want to catch smart fraudsters, we need to encourage a culture of honesty and skepticism so co-workers don’t get caught up in their schemes, and data auditing which nobody wants to pay for.


    Anyway, I have a better proposal that actually addresses the motive itself; better to make circumstances such that cheating accrues no benefit thereby, but rather the contrary. Pragmatism, not idealism.

    (Just don’t ask me to work out how that might be done)

  13. says

    Thanks for the explanation jrkrideau!!! I love learning about statistics; sometimes I wish that I had studied statistics instead of Physics.

  14. Kevin Karplus says

    I think part of the problem stems from bad schooling, where students are graded almost exclusively on “getting the right answer”, rather than on the quality of their writing and analysis of the data they collect. This is driven by large student-faculty ratios, so that faculty take the easy way out on grading. Another part of the problem is that science students are often not taught to debug—to look for discrepancies between their expectations and their data and to look for causes of the differences. I find that in my students the problems are half the time in their mental models (misunderstanding the math) and half the time in incorrect experimental setups. But the students have been conditioned to wave “probably human error” as a get-out-work-free card to avoid doing any debugging. I do not accept that general-purpose excuse: unless they can point to the error they made and redo the work without that error, ‘human error” is not an acceptable reason for bad data.

    Note: I’m working with electronics, not biology, so the models they have to work with are pretty good matches to the real world, if they do things right. We do have some labs where the simple models are not adequate, but nothing as messy as biology.

  15. garnetstar says

    @16, ITA with your thoughts on students, and they are accurate for chemistry as well electronics.

  16. DanDare says

    @14 John you can start with rewarding finding the flaws. Rewards can be material like cash prizes or social like a status boost.
    Peer review is intended to be a system to do that as well.
    Perhaps a big picture view is to look at the cost pressures on the enterprise of science. Doing the work has a cost. Doing the quality assurance provides an added cost. Doing the initial development of scientist has a cost.