I thought I would use the recent resurgence of interest in the issue of intelligence and race to highlight some lesser-known and more technical aspects of this contentious debate.
While everyone has some intuitive sense of what intelligence consists of, these vary widely from individual to individual due to the amorphous nature of the concept. Is it verbal fluency? Numerical adeptness? Critical thinking? Logical skills? Depending on one’s preferences, one can come up with many different ways of defining intelligence and testing for it. When it comes to quantifying intelligence and trying to measure it (assuming that it can be reduced to a single measure, itself a highly problematic thesis) one must realize that any measure is always a proxy for the quantity being sought and the issue becomes how good a proxy it is.
Any test measures something. The question is what that something is and how we interpret the result. Charles Spearman in the early 1900s asserted that, while individual tests may be context-dependent, if we take a large number of tests of different kinds, then we can statistically extract from them a single number g (which is now referred to as “Spearman’s g“). Moreover, this number will be context-independent and so will provide a meaningful and unitary measure of “general intelligence.” Thus, just as length and time can be measured, people can be ranked along a linear intelligence scale that allows for comparison of intelligence. Similarly, the quality of various I.Q. tests can be evaluated by the amount of “g-loading” they have. That is, those tests that correlate strongly with g are deemed “better” at evaluating a person’s intelligence than those that do not.
The problem with such a definition of intelligence lies in the assumption that because you can calculate a single number g for an individual, that number necessarily measures an existing property (this is the problem known as reification, where we assign an objective reality to the result of a measurement). In the 1930s, such critics as L. L. Thurstone argued that there could be many different kinds of cognitive abilities and that the same array of I.Q. test results could equally well be analyzed so that they clustered around many different centers, with each center measuring a different cognitive property. The critics argued that to take some overall average of these different measures (Spearman’ s g) is to get a meaningless number. The proposed number of different facets of intelligence that can be measured is more than 200 – a huge difference from the single measures we use for length or time.
To draw an analogy to grading, we could calculate a set of grade-point averages (GPAs) for clusters of subjects, each dealing with a distinct aspect of knowledge: the physical sciences, the life sciences, the social sciences, the cognitive sciences, the fine arts, the humanities, athletics, and so forth. Students could score high in one area and low in another, and it could plausibly be argued that this is because the GPA of each cluster measures a different kind of ‘intelligence’. Graduate departments of physics, conservatories of music, medical schools, drama schools, political think tanks, and so forth might find one or more of these GPAs a more meaningful measure than the overall GPA of the kind of abilities they look for. The debate over what constitutes intelligence is similar, and there is no consensus on which approach is better. But Spearman’ s g created a belief that lives on in the minds of many who believe that 1.Q. test scores are valid measures of a real human property.
But as I argued in a previous post, even if we concede the point that I.Q. test scores are good proxy measures for intelligence and measured something intrinsic to an individual that was largely due to their genes, and was also heritable and immutable, that still did not mean that differences in average values for different groups were due to the genes. They could be entirely due to the environment.
When I.Q. scores are tabulated for each cohort (say within any large enough group like a nation), the scores are normed to give an average of 100 and a standard deviation of 15. This has the advantage that within any cohort, your score immediately tells you your ranking within that cohort. Since the average scores for the population is fixed at 100 each time, what you lose in this process is any longitudinal information, how average scores have changed over time. Psychometrician James Flynn’s work plays an important role in this debate. When he looked at the I.Q. scores of nations over time, he found quite dramatic gains in average I.Q., a increase of 18 points over a 54-year period from 1948 to 2002. This was a gain of 0.33 points per year, a very rapid increase indeed. If I.Q. scores are largely genetically-based, such a rapid increase cannot happen. Note that this increase in scores is larger than the 15-point gap between black and white that Charles Murray trumpeted as signifying black genetic inferiority. (Incidentally, the black-white gap is now 10 points, not the 15 at the time of publication of The Bell Curve and which he still uses.)
How did Flynn figure this out if scores are always normed for each cohort to have an average of 100? What Flynn did was to go back and take I.Q. tests from various times in the past and give them to people now. What he found was that the older the test was, the higher the scores of people now on those tests. In other words, people now seemed to have much higher I.Q. than people just 50 years ago, well beyond the reach of any genetic explanation.
What might be the causes of this increase in averages and the narrowing of the gap? Many things. Education is becoming more widespread in that more people are going to school for longer periods and the curriculum has also become more advanced. Furthermore, life itself has become more complex, even on a technological level, requiring people on a daily basis to develop skills to survive that their parents and grandparents did not need. If one looks at popular culture, TV shows and films now feature multiple interweaving complex plotlines to follow and requiring inferential skills, a far cry from the straightforward narratives of the past. All the skills required to navigate the world are reflected in the I.Q. tests. In other words, I.Q. tests measure what people can do, not what they are, and what they can do depends on what they are taught and what they need to do to live their lives, i.e., their environment.