There’s an old joke I heard from one of my psychology professors at Johns Hopkins: “A psychologist is studying a frog. He puts it on a table near a yard-stick and says ‘JUMP!’ and the frog jumps a foot. The psychologist notes this down in his lab book and cuts one of the frog’s legs off, then puts it back in its starting position and says ‘JUMP!’. The frog jumps – not as well – 8 inches. The psychologist notes this down and removes another leg. The frog manages to jump 4 inches. With only one leg remaining, the frog is ordered to JUMP! and it manages to sort of shift its weight, painfully. The psychologist records 1/4 inch. When the psychologist removes the frog’s remaining leg, the frog just sits there, and the psychologist writes in his lab book: frog with legs removed loses its hearing.”
It’s a bad joke, but it touches on a fundamental problem of science: sometimes we can confuse cause and effect. In psychology, there was a movement (normally most associated with Pavlov and B.F. Skinner) called “behaviorism” – its tenet being that we should not speculate about inner states of a creature and we should only measure its behavior. We don’t say “the dog drools when it hears the bell, because it is anticipating food.” We say, “the bell rings and the dog drools; there appears to be a connection.”
As some of you may know, I’m pretty skeptical of a great deal of psychology, especially the vintage stuff. Most of what I learned about in school seemed to be pointless, if not outright cruel, and worse: wasteful. I remember asking my professor “what was the point of the Stanford Prison Experiment?” and he waffled a bit about what Zimbardo might have expected to learn from it, but really it just sounded like Zimbardo was doing “experiments” for the sake of his own curiosity. I scare-quoted “experiments” because I don’t accept that Zimbardo was actually doing an experiment, at all; to me an experiment has to have a pre-determined measurement criterion that will serve to confirm or deny a hypothesis. I don’t think Zimbardo was engaging in scientific research, at all. I’ve always been a bit horrified by the way my psychology textbooks (this was back in the early 80s) even mentioned such pointless studies – at the same time as they protested the psychology is a science. I’d expect a scientist to only mention Zimbardo as an embarrassing mill-stone, not as a mile-stone in the field. So, I thought that Skinner and the behaviorists were a breath of fresh air: you measure what you measure and you don’t speculate – if the measurement shows something, then there’s no need to speculate.
The problem with behaviorism is that psychology (at that time) really wanted to generalize across species. To me, this was upsettingly pseudo-scientific: an experiment performed on rats does not in any way, shape, or form, allow us to generalize human behavior. An experiment performed on college undergraduates does not allow us to generalize human behavior, either, unless we are able to somehow extract the cultural influences that come along with being a college undergraduate. But, if you look back to the annals of popular psychology from the 70s, there was a great deal of – let’s call it “generalization by implication” – you know: if you put rats in over-densely populated situations, they kill and eat eachother more often; therefore we should worry about cannibalism in cities – that’s the implication. If we put a rat in a Skinner box, we can conclude that: rats are capable of being taught using operant conditioning. Anything else? Not really. We can hypothesize that dogs might be taught using operant conditioning, but to be certain, we ought to try it and see. There’s a reality there, and we can measure it – but what’s hard is to generalize it across species, or to assume that there’s a certain mechanism being exhibited.
That’s where it gets tricky: can we assume the mechanism? If we say that a set of behaviors exhibited by rats, dogs, and chickens in Skinner boxes comprise “operant conditioning” can we claim that the mechanism is “learning” in the human sense? The rat has “learned” how to get cookies by working the lever, but Skinner boxes are obviously not how humans learn. If I take a college undergraduate and offer them $25 or an electric shock to run my test, are they operantly conditioned? It appears to be a different behavior; for one thing, the criteria we use to measure (drooling or pressing a lever) are different. Therefore, obviously, the behavior is different. The frog may have actually gone deaf when its legs were cut off, but unless we have a reliable way of asking the frog, we’ve got nothing.
So, imagine my surprise when I was listening to an episode of The Infinite Monkey Cage podcast [imc] about numbers. At around 5:00 in, one of the panelists asserts that the earliest words in language are counting words. Then he goes on to mention a study that indicates that babies have very early capability for basic numbery-stuff or county behaviors. I deliberately fudged the language there to make it more waffly because the panelist’s explanation appeared to claim confidently that babies can do basic counting because of an experimental result. Unfortunately, I could immediately see a flaw in the experiment as described so I looked around a bit to see if there was a better explanation of the experiment. There’s no need to dig further into the panelist’s beliefs or evidence-based convictions, though I’ll observe that he seems to be engaging in exactly the kind of “interpreting the results a bit too far” that keeps getting psychology in trouble.
Here’s a better explanation of the experiment: [scimag] Let’s look at it a bit and then I’ll explain the hole in the experiment.
If a 6-month-old can distinguish between 20 dots and 10 dots, she’s more likely to be a good at math in preschool. That’s the conclusion of a new study, which finds that part of our proficiency at addition and subtraction may simply be something we’re born with.
Researchers have long wondered where our math skills come from. Are they innate, or should we credit studying and good teachers—or some combination of the two? “Math ability is a very complex concept, and there are a lot of actors that play into it,” says Ariel Starr, a graduate student in psychology and neuroscience at Duke University in Durham, North Carolina.
One of those actors appears to be the approximate number system, or the intuitive capacity to discern between groups of objects of varying magnitudes. We share this talent with numerous other animals, including rats, monkeys, birds, and fish. Some of those animals, for example, can match the number of sounds they hear to the number of objects they see, while others can watch handlers place different numbers of food items into buckets, and then choose the bucket with the most food. For ancient humans, this skill would have been an asset, Starr explains, by helping a group of humans determine if predators outnumbered them, for example.
The experimental layout is this: we take a 6-month-old and show them two screens. On one screen is ten dots. On the other screen are a changing number of dots. By watching where the child’s eyes go – which screen they are looking at – the researchers infer that the child is paying more attention to whichever collection of dots is largest. Therefore: 6-month-olds have an ability to discern “larger number of dots” which implies some basic counting ability, which implies some inherent mathematical ability.
Researchers suspect that this intuitive number sense may play into humanity’s unique ability to use symbols to do math. While both a monkey and a human can look at photos of 20 and 30 dots and then choose a photo of 50 dots to represent that total value, only a human can add the symbolic Arabic numerals for 20 and 30 together to get 50.
See where it’s going off the rails, already? Because humans can be taught to do math that helps, uhmmmm…. No, actually it doesn’t help show that their results are significant. Basically, this is like saying “because dogs can drool, they have an innate ability to use language, since drooling can be a form of communication.”
Note that I am not saying anything about whether I think dogs can communicate (I think they rather obviously can, and if you’ve lived with a dog, you will almost certainly agree) or whether babies can count (I think they quickly come to understand ‘bigger’ and ‘more’ or ‘less’) but if we say that babies’ ability to discern “bigger/smaller” “more/less” then I don’t see how that’s evidence that infants can do counting-like stuff. It’s evidence that infants’ eyes function and brains quickly learn certain things about the world around them.
Let me guess how the researchers controlled for “general intelligence”: I bet they used an IQ test. IQ tests have been fairly thoroughly debunked as a measure of “general intelligence”, and psychologists have responded by breaking “general intelligence” into multiple factors (because: more factors means more apparent precision, or something like that) but IQ tests are subject to all sorts of social influences, which means they do not measure general intelligence. At best they measure something about how experience and learning are affected by not-understood aspects inherent in the subject’s brain. IQ tests have been one of the biggest ways in which psychology has swung at the nature/nurture problem, and face-planted; it’d be awfully sad if an experiment was treating IQ tests as a way of measuring something general about a population, when their ability to do that has not been successfully established, yet. IQ tests also show more variability in children than adults, which tells me not “children’s intelligence varies more widely than adults'” but rather “IQ tests are worse at measuring things about children than about adults.” One cannot simply brush this away by saying “IQ tests are the only measure we have, so, whatever…” because the experiment’s results live and die based on science that has been more or less debunked. Even IQ testers have backpedalled from the notion that it measures general IQ, to a notion that there are different types of IQ (yeah, so?) and that it’s mostly useful as a tool for measuring individuals against themselves as a baseline and not comparing between individuals.
Children who performed in the top 50% of the math achievement test had a significantly higher intuitive number sense in infancy than those who performed in the bottom 50%, the authors found. This relationship held true even when the researchers controlled for general intelligence.
I’m not saying that none of the experiment’s results are true. I doubt, however, that they have any basis for being so confident that they are measuring what they think they are measuring.
The mechanism of science is that we theorize a cause/effect relationship and try to narrow it down by constructing experiments that confirm or deny the cause/effect relationship by manipulating it. A theory that has predictive power is one where the theory allows us to predict how the experiment will come out, based on the theory. That means that if someone can throw out an alternate theory that has the same predictive power and matches your experimental results: you’ve got a problem. Now, you can no longer claim that your theory is true because you have two theories that may be true – you need to devise an experiment that lets you more finely divide cause/effect to determine which theory is right (or more usually, eliminate the one that’s wrong).
Based on the above, let me show you how to explode this experiment: let’s hypothesize that animals that forage, predate, or are subject to predation, have a built-in function that processes the input from their eyes and detects changes in a scene. For example, suppose you are looking at a field and you see no deer. You divert your attention to check your txt messages on your cellphone and you look up at the field again and immediately notice that there are deer! Yummy deer! Our predator/prey “scene visual change detection engine” allows us to rapidly detect change from one scene to another and it wakes up other bits of our brain to process the changes. Put differently: change is interesting. There, now, that’s just some bullshit that I made up but I think it’s plausible enough that I could use the same experiment to argue that my “scene visual change detection engine” exists in babies. And dogs. And rats. And octopi, if I could figure out a reliable way to see which bunch of dots an octopus was interested in. I can even argue that having a good “scene visual change detection engine” is a component of some aspect of IQ and we should expect that babies who scored better at detecting changes in scenery (more change means more interesting) might turn out to be better at math, someday, if they have parents who can afford to co-sign their student loans.
I’m not outright rubbishing this experiment, but it seems to me that we should be a lot more skeptical of some of the results that get promoted into the popular zeitgeist. Normally, when I take a poke at psychology, someone comes along and says “but that’s pop psychology, real psychology does not depend on old broken concepts like IQ” – except they do. That’s the problem, when your science has built itself on foundations of sketchy results. You can do the best work you can, and the results are still sketchy. As soon as someone in an experiment says anything about factoring out general intelligence, they’ve left themselves wide open for a skeptical challenge rejecting their entire premise.
The bad joke about the psychologist and the frog actually cuts to the core of how we practice the method of science. We use theory to design experiments that allow us to vary cause/effect so we can confirm/disprove the theory. The reason the story about the frog is not funny is because the psychologist in the joke has just as much reason to conclude that the frog has gone deaf, than that legs are important to jumping behavior – unless, that is, the experiment also somehow controls for the relevance of legs to jumping, or has some other way of testing the frog’s hearing.
I also suspect Pavlov cooked his results. Saying “the dog drools” is really vague. And, as a person who lived with dogs for many years, I don’t believe that’s a behavior one would actually measure in real dogs. “Device to count drops of saliva” are you fucking kidding me?
Also, the dogs I’ve known (I tend to favor the smarter breeds) (whatever “smart” means) would opt out of the experiment. My dogs, Miles and Jake, had never experienced food-stress and, while they knew what hunger was, they would have ignored the experiment entirely and focused on trying to escape the harness or terrify the experimenter into letting them go. The whole Pavlov experiment seems bogus to me. Other readers with experience with dogs care to support or contradict my opinion?
I will not further belabor the fact that Pavlov’s results are part of the core of psychology’s epistemology: operant conditioning is considered to be a model for how learning takes place. If Pavlov’s experiments are bullshit (as I suspect they are) then we must engage skeptically with extensions of the operant conditioning model of learning; it’s probably bullshit, too. Suddenly we’re in quicksand.
“a tool for measuring individuals against themselves as a baseline and not comparing between individuals” – let’s say that infant mortality in poor children is higher than the rest of the population. Let’s say that some percentage of the babies measured in the study grew up poor and – died. So, now you have social factors skewing the results: babies with more affluent parents might score better simply as a consequence of surviving.
What if an “IQ test” is actually a “paying attention test”? Or a “willing to engage in dull tasks test”? I haven’t even got an IQ anymore because I can’t be arsed to take a test.
We also should be skeptical of any experiment involving babies. Testing them for cognitive tasks is going to be hugely influenced by whether the tyke has been sleeping comfortably, has gas, or is hungry. They’re babies and anyone who spends time around a baby is going to tell you that babies are pretty variable all on their own and they’re not going to be able to tell you why. Did the experimenters normalize the baby set-up to make sure all the babies had clean diapers, a certain amount of sleep, and their mother was in the room? A baby that’s got gas is probably going to behave very differently on the test from one that doesn’t. (See above comment regarding infant variability in IQ tests) Like with Pavlov’s dogs, we cannot ask the dog; perhaps the dog would just say “I’m a mastiff, we drool, stupid.”
“children who performed in the top 50%” – top 50%? That’s making my p-hacking alarms go off. “We measured an almost immeasurably small difference compared to a coin-toss.”
“what was the point of the Stanford Prison Experiment?” – Zimbardo was originally trying to see if he could experimentally determine that there’s something about Germans that makes them more obedient to authority and nobody was stupid enough to fund that. So he managed to secure funding (?where?) and ran the experiment to determine that – what? – people tend to submit to authority. Yes, they do, that’s what authority means.