Observe the dog; it enjoys bacon. We might be able to conclude that a wolf would enjoy bacon, too. But “might” is the key word, there – can we really? I’d say it’s highly likely, but that’s based on my knowing other things already about wolves and dogs – namely that they are omnivores.
Why, then, might I run a complex and abusive experiment to determine if dogs like bacon? That would be a waste of time, right?
Popular psychology is full of experiments that don’t make much sense, because they don’t teach us anything. If we could be sure that experiments run on a mouse would produce the same (or nearly similar) results were they run on a human, then we’ve discovered a valuable thing: mice are a reliable proxy for human behavior. That’s absurd, though.
Let’s imagine a scenario: our hypothesis is that chimpanzees are a proxy for human behavior. We put groups of chimpanzees in a mildly stressed situation – they watch a violent movie – and we measure whether they are more likely to demonstrate conflict afterward. Let’s say that our results are mostly positive, so we publish them somewhere. The next week some popular media website runs an article entitled “chimp study shows that gamers get more violent after playing violent games!” I’d say that obviously, the website got it wrong (a lot of science reporting is terrible like that) but what was our study really going to show? OK, disclosure, I cheated: I didn’t say if the chimps were Bonobos or Pan Troglodytes, or a mix. What if we re-ran the experiment with only Bonobos, and discovered that grooming behaviors and blowjobs increased. And that Pan Troglodytes actually do show more aggression? I’m just making this up, of course. I don’t know. And neither does anyone else.
That’s one of the big problems (to me) with psychology and how it is presented to the public. For the sake of argument, let me loosely define “pop psychology” as “the generally over-interpreted and inaccurate public understanding of psychology theories and research” as opposed to “psychology as a science” which is the part that is trying to do research based on evidence and the scientific method. Several times, when I’ve attacked psychology, people have attempted to draw a line between “pop psychology” and “science psychology” so let’s just assume that line exists and that we can mostly accurately distinguish pop psychology from science psychology. In other words, I am willing to imagine that there are Real Scientists who clutch their temples and cringe when they see “chimp study shows that gamers get more violent after playing violent games!” but they are too busy writing their next research grant to actually contact the writer at the website and tell them to correct the article.The problem I am raising is basic epistemology – the branch of philosophy that deals with knowledge, and how we know when we know things. The scientific method is an epistemological tool that is used to establish knowledge by varying cause and effect (experiment) in order to demonstrate the relationships between causes and effects. Elsewhere I have referred to results in experiments as “generalizing” which is really a sloppy way to put it – what’s really going on is that we’re trying to establish cause and effect so we can predict that things are going to happen, using induction. If one wolf likes bacon, that’s interesting, but if every wolf we ever test likes bacon, we can say we have learned something about wolves. When we say a creature “is an omnivore” that is also a way of predicting by induction: if you tell me a cat is an “omnivore” that means I can fairly confidently say it will like bacon.
Where it gets interesting is when the results are not perfect (which is most of the time) – what if our experiments with bacon and wolves allowed us to determine that 65% of wolves like bacon? Then we start subjecting our results to typical epistemological challenges: is bacon-eating a learned behavior, or is it inherent in “omnivore”? Because “omnivore” does not work with “65% of wolves like bacon” – “omnivore” means “can eat anything” not “will always eat anything” – but why do we even have a word like “omnivore” unless we are trying to predict and understand behaviors? If we don’t try to establish knowledge by induction, we’re left in some sort of pyrhhonian skeptical hell in which we can only discuss the appearances of things right now – I.e.: “I can’t speculate about wolves in general but that wolf right there seems to like bacon at this particular moment. It may not at any other time. In fact I can’t speculate as to whether or not the wolf just choked that bacon down because it thought it was doing you a favor.” I’m being silly; the point is that we can’t speculate, we can only observe, which traps us in the here and now. The entire value of the scientific project is mapping cause and effect and generalizing knowledge by induction – if we have to repeat every wolf-and-bacon experiment, every time, we can’t claim to know anything about wolves except that we are serving them a lot of bacon. When we start talking about P-values what we are doing is acknowledging that we are dealing with a probability that one of our induction rules is true – we want to say that if our experiment showed 65% of the wolves like bacon, then for any given wolf, there’s a similar chance. We could use that rule to build a Monte Carlo Simulation of the disappearance-rate of a hypothetical bacon supply around a hypothetical pack of wolves.
I’ll go a step further and say that what I outlined above is, I believe, a simple version of the scientific method and empiricism. If I’m wrong about that, I’m sure someone will correct me. But if I’m substantially right, then I think my point about popular psychology ought to stand pretty well: if we start extrapolating too far from our results, we are committing scientific malpractice – we are poisoning civilization’s foundations of knowledge. So if the website ran an article saying “Study shows there is no ‘vegan’ wolf” I’d expect that to induce cringes in the skeptical community, and emails to the editor reading, “no, the article says 65% of wolves in a certain circumstance eat bacon. No less, and certainly no more.”
If you’re still with me, you can now see how a skeptic might immediately have a problem with IQ testing, or damn near any other psychometric test if it’s used to generalize results across individuals. Imagine if someone gives me a test intended to detect suicidal ideation, and I score 65% on it. What does that mean? First off, it does not mean that “all people named Marcus are 65% suicidal.” It does not even mean “this particular Marcus is 65% suicidal” – or, if it does, so what? It’s not as if I am going to roll 2 D20 and say “oops, ’00’ I’m outta here.” The test is, however, useful as a diagnostic tool for comparing how this Marcus scored today, versus last year. I’d say it is entirely legitimate that a medical practitioner might give me such a test for comparative purposes so they could learn something about what I report over time. It is entirely legitimate, for example, for a psychiatrist to suggest anti-depressants if I score in the low 20%s for 5 years and suddenly spike up to 98%. We understand that all these things do not guarantee a fixed outcome – that’s why the memeosphere is full of stories about “98 year-old mine-working grandma smokes a pack a day and washes it down with Jack Daniels, says she wants to run for president in 2030.” On the other hand, as a diagnostic tool, we need to be able to generalize some results in order to understand the world and make good recommendations: if your car’s motor stops extremely suddenly and oil starts pouring out of the oil-pan you have almost certainly thrown a rod and you should not expect that motor to run any more.
When scientists start doing experiments, it immediately raises an alarm for me if their experiment doesn’t have a clear data-point that they can confirm/disconfirm. I also get suspicious when I see a result that looks sort of obvious being presented as a big discovery. If someone actually did an experiment to see what percentage of wolves like bacon, I’d kind of think they just enjoyed making wolves happy, or were doing the experiment as an excuse to hang out and dance with wolves, or something. I also get suspicious of psychometric tools that are not diagnostic: most specifically IQ tests. [*] That’s a long-form answer to voyager’s question on my earlier posting, “what do you think of MMPI?” – as a diagnostic tool it’s OK but if it’s being used comparatively across individuals then I think it’s a party trick. Since it’s all self-reported data, why not just ask, “do you feel less suicidal than you did last time we talked?” Forcing a patient to answer a bunch of questions doesn’t make the test more accurate it makes it longer.
Here, I am specifically thinking about this sort of thing [psych] –
Two recent studies by Harvard psychologists deliver promising data from 2 tests that may help clinicians predict suicidal behavior. The markers in these new tests involve a patient’s attention to suicide-related stimuli and the measure of association with death or suicide. In the first study, lead investigator Matthew K. Nock, PhD and colleagues adapted the Stroop test and measured the speed at which subjects identified the color of words appearing on a computer screen. It was found that suicidal persons focused more on suicide-related words than neutral words. Suicide Stroop scores predicted 6-month follow-up suicide attempts well over traditionally accepted risk factors such as clinicians’ insight into the likelihood of a patient to attempt suicide, history of suicide attempts, or patient-reporting methods.
I would like to see an effectiveness comparison between those tests and just asking the patient how they feel that day. Maybe these tests are more effective – and if they are, I’d expect that to be easily shown. There are huge problems with any test where a subject is reporting their own assessment of their inner states. It’s one thing to ask, “do you feel more depressed than you did last week?” it’s another to ask, “do you feel more depressed than Fred, over there?” If you have a personality inventory, that’s the epistemological problem it is undertaking: it is trying to put everyone in the room on a common scale for depression, or IQ, or whatever. I understand why people would want to try to do that, but it’s an absurdly hard problem and when I see someone trying, I immediately suspect their motives or I wonder if they have studied the scientific method at all. [**]
The nature/nurture problem rears its head in all of this, because when we’re trying to measure something about a person, we have to deal with the question of whether we are measuring some attribute of the person, or some attribute of the society they grew up in. Someone might score radically differently on a test depending on whether they were raised with access to certain kinds of training, or not. Simply knowing a few test-taking techniques can shift a test-taker’s IQ score, which means that the test is measuring (to some degree) the subject’s education and (to some degree) something innate about the subject. That “to some degree” means, in my opinion, that the entire enterprise ought to be eyed with suspicion – especially when psychometrics are used to make decisions that will affect a subject’s life.
I tried to phrase that very carefully and I hoped I succeeded. There is a big difference, in my mind, between when society tries to use survey techniques to make decisions about people, and when a subject participates willingly with a medical professional to try to diagnose themselves so they can influence their own outcomes. If a psychiatrist uses a subjective depression test to help a patient assess whether they are depressed, that’s fine – it’s part of the tools of the field – but what happens if that same depression test is used to shunt students into a “special school for depressed kids” that reduces their opportunity in the world? Conversely, it may improve their opportunity – but all of that needs to be under the subject’s control because society has a terrible history of hiding racism, sexism, and xenophobia behind these things. In fact that is happening today, with IQ tests.
I’ll do Part 2 if this posting survives the criticism and shredding I expect it to get.
* “I also get suspicious of psychometric tools that are not diagnostic: most specifically IQ tests.” Usually someone chimes in and says “but that’s not how they are supposed to be used! They are a good tool for measuring cognitive decline, for example!” – that’s true. But they are also being used to determine which high schools kids are allowed to attend, which has a very serious consequence on their lives, and social tropes embedded in an IQ test is a serious problem in that case. Look, I understand that a screwdriver is not a chisel, but you need to acknowledge that IQ tests are being used wrong all the time, in ways that continue to harm people.
[**] Richard Feynman’s portrayal of psychology as a cargo cult, and his take-down of a certain mouse-in-maze-running experiment, has had a great influence on me, although I didn’t read it until about 1991, and I had already gotten my psych degree and rejected the field by 1986. [feyn]
Regarding MMPI: I would say it’d be interesting if my psychiatrist told me, “remember that MMPI you took 15 years ago? Between then and now, you are scoring a lot higher on psychopathic deviate,” as I ate his brains with some chianti and fava beans.