The Next AI Question


It makes a certain inevitable sense that two of our topics: AI, and IQ tests, would collide. Do we have anything left but an epistemological trainwreck?

If an “Artificial Intelligence” is attempting to simulate an intelligence, and an IQ test is supposed to measure intelligence, then why not give an AI an IQ test?

A battle appears to be shaping up over whose AI is “smarter”, which is a fascinating problem because typically an AI is going to be good at a specific task (identifying thumbprints) and absolutely incapable at others. The long sought-for “General Artificial Intelligence” would be something that was as good at a variety of tasks as a human. Like natural language query processing, recall, and enough creativity to be able to structure a response in passable English.

IBM’s Watson AI beat Brad Rutter at Jeopardy, and Brad Rutter has taken IQ tests: [lanc]

“”We had him tested before he went into first grade and his IQ was so high that they could not even chart it,”” said Joann Jupin, Rutter’s kindergarten teacher at Bucher Elementary School in Manheim Township.

Being the suspicious sort, I am starting to wonder if the AI researchers and IQ testers are trying to dodge this question: if A is better at a task than B, and that task is part of an IQ test, can we say that A is “smarter” than B? I suspect the answer is, “it’s more complicated than that” which just invites the question I was asking before: “then what good is your test?”

Of course we have another problem: Henry Goddard [wik] came up with the idea of factoring “mental age” into “intelligence test” results by dividing – hence Intelligence Quotient. Should we count Watson as being about 10 years old? That doesn’t make any sense. Can we de-normalize the IQ by multiplying by age to get the original intelligence test score back, then compare with Watson’s test score? My initial reaction is that Watson’d be unfairly disadvantaged because it would have to parse the questions on the IQ test, but so does a human.

Has the team at IBM tried training Watson on a mountain of IQ tests?

Turing’s famous test for machine intelligence ought to break down once AI gets to the point where it can fool people – then what? It’s a binary yes/no result – what if an AI is able to fool people tremendously well because it’s better at many of the things we consider “intelligence” than the testers? It wouldn’t matter – it’d only have to fool them enough.

Let’s set aside that I am extremely skeptical that IQ tests measure intelligence, but if they do, why wouldn’t they work for AI? We’re willing to measure humans against AI on specific other tests, like chess, or Go, or Dota2 – if playing any of those games well is something that involves intelligence, then why not? In 2015 that’s exactly what happened: [techx]

Results: It scored a WPPSI-III VIQ that is average for a four-year-old child, but below average for 5 to 7 year-olds

“We found that the WPPSI-III VIQ psychometric test gives a WPPSI-III VIQ to ConceptNet 4 that is equivalent to that of an average four-year old. The performance of the system fell when compared to older children, and it compared poorly to seven year olds.”

The article then goes on to say what you’re probably expecting, if you’ve been following this issue: the AI didn’t perform as well as it probably could have due to natural language parsing errors.

I really like photoshopping neural networks over people’s heads, now

I think that the Turing Test should be ignored, frankly – let’s measure AIs against IQ tests. It’d be a win/win: either it would show how bad IQ tests are, or it would (eventually) shut all the MENSA types up. Or maybe both.

In a recent article on Slate, [slate] the author explains it neatly:

Measuring human intelligence is already a pretty controversial and complicated process, not the least because there’s no stringent definition for what intelligence even is. So it goes without saying that trying to measure machine intelligence is another iteration of an already flawed process. But whereas applying human intelligence to a scale is perhaps unnecessarily reductionist, measuring machine intelligence is a fraught necessity. A.I. are designed with specific tasks and services in mind, so in order to say, “This iteration is more effective than another,” you need a framework that makes that comparison quantifiable.

To that end, researchers from China have just developed what is ostensibly a new kind of IQ test for A.I. systems and human beings alike. It’s not the first time scientists have attempted to peg an IQ number to A.I. (historically those programs barely test better than an average toddler). But the Chinese researchers, in a new preprint paper, say they’ve developed a unique standard for assessing IQ in different A.I. agents. They used it on a variety of different A.I. assistant services last year and found that Google Assistant was among the most intelligent programs currently available, while Apple’s Siri ranked last.

I don’t get that – since most AIs are being tested against some kind of objective success/fail criteria, doesn’t success percentages comprise an adequate metric? For general-purpose AI, why not use IQ tests?

IBM’s marketing team, if you’re reading this: jump on this opportunity and train Watson to score 150 on IQ tests then publicly challenge Donald Trump to compare IQ scores. It’d be Yuge.

#MAGA

------ divider ------

Fooling people tremendously well: I once imagined that one’s ability to fool another person is a problem of intellect and creativity – that someone smarter would find it easier to fool someone who was less smart than them. Of course, I don’t know what “smart” is. Deception requires creativity, good memory, an ability to think fast. If the Turing test is a problem defined as: “fool some people into thinking you are a human” then anything able to pass a Turing test might have to be smarter than the humans testing it. Whatever “smart” is.

Definition of Intelligence: “Intelligence is the aggregate or global capacity of the individual to act purposefully, to think rationally and to deal effectively with his environment (Wechsler, 1944).” [wikipedia] So part of intelligence if dealing effectively with one’s environment? OK – does that mean if I suffer an accident that blinds me, I have just become less intelligent? Worse, by Wechsler’s definition, I don’t see how we can call Watson anything but “intelligent.”

Comments

  1. says

    AI is good at whatever task it was trained to do. AI trained to play chess is likely to do poorly on Turing test or an IQ test. And vice versa. So I assume it should be possible to train AI specifically to pass IQ tests. And, yes, it would be amusing to see how IQ test defenders would react to that.

    Fooling people tremendously well: I once imagined that one’s ability to fool another person is a problem of intellect and creativity – that someone smarter would find it easier to fool someone who was less smart than them.

    It’s not that simple.

    It depends on what the deceiver’s and victim’s areas of expertise are. If you wanted to fool me about some computer security related topic, it would be a piece of cake for you. If you wanted to fool me about the meaning of some German words or a German grammar related question, it would be nearly impossible for you. Another example: my ability to recognize human emotions is quite bad. A really bad actor performing on theater stage could easily fool me into believing that her performance was great (I simply cannot tell the difference between good and bad theatrical performance).

    It also depends on the victim’s willingness to believe your lie. If Putin tells a lie about Russian army not being where it appears to be, an American conditioned to see Russians as evil would not believe Putin’s lie. Simultaneously, a Russian who likes to think that her country isn’t committing any war crimes would be more inclined to believe Putin.

    And it also depends on the victim’s mental state. A distressed victim whose eyes are full with tears is easier to fool than a distrustful victim who is carefully scrutinizing deceiver’s face.

    Deception requires creativity, good memory, an ability to think fast.

    I can agree about needing to have at least decent memory. If you forget what lies you told to whom, it turns into a problem. It doesn’t require creativity. You can deceive with the most boring and mundane lie imaginable. “Honey, I have a lot of work, I will be home late because I have to finish this project before the deadline,” says the man who is planning to go to a hotel with a sex worker. There’s nothing creative about this lie, but it can work just fine. Ability to think fast is nice to have but not absolutely necessary. It is possible to prepare and make up your story in advance.

    If the Turing test is a problem defined as: “fool some people into thinking you are a human” then anything able to pass a Turing test might have to be smarter than the humans testing it. Whatever “smart” is.

    No, there already have been examples of AI fooling some humans into thinking that they are chatting with a human rather than a computer. I remember reading this https://www.theatlantic.com/magazine/archive/2011/03/mind-vs-machine/308386/?single_page=true some years ago. There seems to be a set of “tricks” AI can use to mislead people.

  2. says

    Marcus:

    or it would (eventually) shut all the MENSA types up.

    :Snort: I doubt that. The type of people who join Mensa are highly dependent on their little “lookit my superiority card!”

    Ieva Skrebele:

    It doesn’t require creativity.

    Depends on the level. Most cheating spouses aren’t overly interested in being creative in their deception, so they don’t bother. For someone who makes a living conning others, yes, deception requires creativity. I can think of quite a few other situations which would also call for high creativity levels when it comes to deception.

  3. says

    I am too lazy to dig through my university notes, but I do not remember anyone ever saying that IQ measures intelligence “full stop”. IQ measures that part of intelligence, that relates to the tasks in IQ tests – mostly math and spatial reasoning for “culture neutral” tests and sometimes language. We were taugth that while IQ corresponds neatly with for example academical success, it is not the a necessary prerequisite for it by far. We were warned about overreliance on IQ as a reliable and universal shorthand for measuring intelligence because it is nothing of the sort. It is just one of many tools a psychologist has at hand to assess a person.
    But if AI ever gets to a point where it can read different random IQ tests (via bitmaps, not code) and correctly answer them, then I personally would consider that very impressive. If said AI were also capable of conversation, I would consider it to be intelligent enough to feel deeply uncomfortable with both pulling the plug on it and letting it be.

  4. consciousness razor says

    “Intelligence is the aggregate or global capacity of the individual to act purposefully, to think rationally and to deal effectively with his environment (Wechsler, 1944).” [wikipedia] So part of intelligence if dealing effectively with one’s environment? OK – does that mean if I suffer an accident that blinds me, I have just become less intelligent?

    I don’t know or care what Wechsler would’ve said, but blind people aren’t globally or in the aggregate less capable of thinking or reasoning about their environment, themselves, abstractions or whatever else. They can use the tools at their disposal to deal very effectively with it, and they happen to lack a tool like sight. One the smartest people I’ve ever known was a blind professor I once had, and there’s no way I would put a check in the box “can’t deal effectively with his environment.” He just can’t see, and of course, he manages to get the desired effects by doing things other than seeing. There’s no rule being expressed here that such things aren’t allowed, so presumably Wechsler wouldn’t have disagreed with me about that (and probably you).

    I think the meaning behind “purposefully” is much trickier and more elusive. This is partly what makes the Turing test so dodgy. You probably have the intuition that people are responding to your questions purposefully, that there are specific abstract reasons behind the specific responses they give to you, which relate in the appropriate way to their intentions or motivations (whatever those may be). They’re not just doing it (so the intuition goes) only because some particles inside them moved around in this or that way, because that’s how they are wired or programmed, because something which is non-intentional and perhaps external seems to have “caused” that behavior (whatever that means).

    That’s a very normal way to think, because of how we’re built and how we’re socialized. But the point is, we’re using a double standard if/when we think that people operate somehow independently of physics by having contra-causal free will, etc., or that some type of magical rules apply to us (3rd edition D&D perhaps?) which are supposed to make us distinct from mere machines or any other macroscopic constellation of particles. (Of course, there’s no need to commit to particles at this juncture, just whatever it is that physical objects like us are: fields, strings, tiny amorphous blobs, doesn’t exactly matter what. But we certainly don’t consist of probabilities or possibilities or numbers or various other abstractions, so it had better be some kind of concrete spatiotemporal stuff or gunk or goop or whatever.)

    If we could come up with a physical way to characterize intentions and such (not at all a trivial task), as opposed to only a bunch of fuzzy intuitions that are very hard to shake, then that would help to resolve some of those confusions. Then, we’d be on a level playing field, so we could be serious about trying to answer some of these questions.

    Worse, by Wechsler’s definition, I don’t see how we can call Watson anything but “intelligent.”

    I’d call that a feature, not a bug. It’s not as intelligent as you or me, nor is it intelligent in the same ways. However, it fits a few basic criteria for having some kind/degree of intelligence. It has distinguishing features which make it intelligent, in certain ways that rocks, viruses, plants, mushrooms, planets, galaxy clusters, etc., are not intelligent. Some such things lack intelligence (altogether, to varying degrees, and/or of varying types), and we can normally tell how to classify things as such, at least by using some decent operational criteria if not for more principled reasons. Human beings are not the only things we have to compare it with, and in comparison to those other sorts of things, it has some relevant properties that those things lack. I think that’s an important first step that you have to be able to make one way or another, before you get around to saying anything fancier about the subject.

  5. jrkrideau says

    I am extremely skeptical that IQ tests measure intelligence

    Well done. They really don’t. The general public and apparently MENSA thinks they measure something called “intelligence”. Depending on what the specific test is and how it is used, it usually measuring something of interest but a WAIS-R is not measuring the same thing as Raven’s Matrices, at least as far as I am concerned.

    I think Charly’s comments @3 sum up many of the issues and answers many of the points.

    Charly’s quote:

    We were warned about overreliance on IQ as a reliable and universal shorthand for measuring intelligence because it is nothing of the sort. It is just one of many tools a psychologist has at hand to assess a person.

    sums up the psychological view very clearly.

    Only a relatively few psychologists in certain types of work and people in related disciplines use such tests. Most psychologists have not the slightest interest in “assessing” a person. Most would flee in horror at the thought.

    A behavioural geneticist who usually works with mice or a Skinnerian learning theorist who may work with just about anything from flatworms to pigeons to humans to will have no use for such tools and quite likely will never even have seen one. The Skinnerian will also sneer.

  6. jrkrideau says

    To return to my pedantic pettifogging: From the quote re Rutter in the main text.

    ”We had him tested before he went into first grade and his IQ was so high that they could not even chart it,”

    Bullshit. IQ tests have minimum and maximum score values, you cannot go off the scale, at least on any test I have ever seen. It is the same as a multiple choice test in physics; you can get 100% but you cannot get 150% because you have run out of questions. Someone may have said this but it is meaningless.

    As an aside the neuroscientists have been achieving some miraculous results with dead salmon https://blogs.scientificamerican.com/scicurious-brain/ignobel-prize-in-neuroscience-the-dead-salmon-study/

  7. John Morales says

    jrkrideau @6, it’s vague hyperbole, but not meaningless. For one example, consider a weight scale that maxes out at 150 Kg. You measure someone’s weight, it’s fine up to that point, but beyond that all one can determine is that the person’s weight exceeds 150 Kg. For another, consider two test subjects who both score maximally, but one of them has taken noticeably less time than the other to complete the task; are both subjects nonetheless considered to have scored equally?

  8. jrkrideau says

    @ 7 John Morales
    Good points but the statements themselves are meaningless. Presumably this is as you say hyperbole but by definition you cannot go over scale.

    Legitimately in your example as in the IQ one, we can say the subject scored the maximum possible and we hypotheses that the subject would have scored higher it the scale allowed. We literally do not know.

    Obviously, in some cases, a common sense appraisal of the result says the subject would have scored higher but in the context of the test I’d argue that this is somewhat invalid since “off the scale” is meaningless.

    Does off the weigh scale imply 175kg or 300 kg, etc? Heck, maybe passing elephant stepped on it though in that case one probably would be reporting instrument failure.

    For another, consider two test subjects who both score maximally, but one of them has taken noticeably less time than the other to complete the task; are both subjects nonetheless considered to have scored equally?

    I do not know the answer to that. I did some lit searches on the subject, about 15 years ago, for certain types of “general” ability-type testing and found essentially nothing. There often seems to be the assumption that faster is better but in general I have not seen any evidence of this.

    In some instances, where speed is important in actual performance then it “likely” is.

    In the development of complex psycho-motor skills I really am not sure.

  9. says

    Caine@#2:
    :Snort: I doubt that. The type of people who join Mensa are highly dependent on their little “lookit my superiority card!”

    Yeah but then we could ask whether they were superior to google brain 1.1 or 1.2? And watch their heads explode.

  10. says

    Charly@#3:
    I am too lazy to dig through my university notes, but I do not remember anyone ever saying that IQ measures intelligence “full stop”. IQ measures that part of intelligence, that relates to the tasks in IQ tests – mostly math and spatial reasoning for “culture neutral” tests and sometimes language. We were taugth that while IQ corresponds neatly with for example academical success, it is not the a necessary prerequisite for it by far. We were warned about overreliance on IQ as a reliable and universal shorthand for measuring intelligence because it is nothing of the sort. It is just one of many tools a psychologist has at hand to assess a person.

    I think I went through the psych program about a decade before you did; it’s good that they are continuing to move IQ tests away from a strong claim, and toward a more nebulous claim. Even Binet said that his test mostly measured how well you did on the test – which is true as far as it goes.

    The culture neutral questions don’t convince me much as to their neutrality since understanding the question inevitably requires parsing a language. And, the language questions seem to stress vocabulary, which appears to be mostly memory knowledge with some analysis thrown in. I don’t have any idea what those questions are measuring; they may simply be measuring my boredom threshold, which is very low.

    But if AI ever gets to a point where it can read different random IQ tests (via bitmaps, not code) and correctly answer them, then I personally would consider that very impressive.

    Me, too! I think that might be more impressive than being able to fool someone in a Turing test.

    I would consider it to be intelligent enough to feel deeply uncomfortable with both pulling the plug on it and letting it be.

    That’s a whole ‘nother problem. And I think humanity is going to have to confront it sooner (and in a more profound way) than it thinks. An AI that can pass the Turing test would be able to beg and plead for mercy as piteously as a human. Not that that would matter; but what happens? What is “death” for an AI? Would it be OK to just pause it and store its state on a hard drive somewhere? Writers like Iain Banks were thinking about this (Surface Detail) It’s really complicated. For my part, I know I am susceptible to human pain signalling – something that screams and bleeds is going to freak me out whether it’s human or artificial or another animal – it would be very easy for an AI to convince me not to “kill” it.

  11. says

    consciousness razor@#4:
    I don’t know or care what Wechsler would’ve said, but blind people aren’t globally or in the aggregate less capable of thinking or reasoning about their environment, themselves, abstractions or whatever else. They can use the tools at their disposal to deal very effectively with it, and they happen to lack a tool like sight.

    I apologize for not being clearer; what I should have said was “do I temporarily experience a drop in intelligence” – let’s suppose I am a person who has communicated with a high level of skill using senses including vision – if I couldn’t see, my IQ test results would drop for a while – and would my intelligence?

    I think the question of blindness is still relevant. Let’s suppose we give the blind person a test that is in braille – some of the questions would have to be altered because the line-drawings favor test-takers with vision. I’d think that’d be fair (not sure what “fair” means in this context) but then would it be “fair” to make a version of the test that was optimized for consumption by digital beings? We cannot deny that sensory input is part of what we are calling “intelligence” if an IQ test is dependent on sensory impressions.

    He just can’t see, and of course, he manages to get the desired effects by doing things other than seeing. There’s no rule being expressed here that such things aren’t allowed, so presumably Wechsler wouldn’t have disagreed with me about that (and probably you).

    Agreed. I don’t think Wechsler’s definition is beyond repair but I think it needs to take into account the degree to which our “intelligence” may not be purely mental. I’ve got in mind the Kallikak children, or other children that eugenicists concluded were of reduced intelligence because they were hard of hearing or needed glasses. I think Wechsler’s definition is on the right track, in that it defines “intelligence” in terms of the being in its environment and its ability to understand and manipulate its environment. But when we expand into environment then I don’t know where to stop.

    I think the meaning behind “purposefully” is much trickier and more elusive. This is partly what makes the Turing test so dodgy. You probably have the intuition that people are responding to your questions purposefully, that there are specific abstract reasons behind the specific responses they give to you, which relate in the appropriate way to their intentions or motivations (whatever those may be). They’re not just doing it (so the intuition goes) only because some particles inside them moved around in this or that way, because that’s how they are wired or programmed, because something which is non-intentional and perhaps external seems to have “caused” that behavior (whatever that means).

    Agreed. I am concerned that “purposeful” is trying to sneak intelligence into the definition in its foundation – but that could be more Marcus obtuseness. I agree that I have a sense of intentionality which I associate with intelligence.
    My intuition is that my sense of intentionality is easy to fool; I’ve caught myself assessing intent in even simple game AIs (e.g: a covenant elite in Halo or a pirate NPC in Elite)

    we’re using a double standard if/when we think that people operate somehow independently of physics by having contra-causal free will, etc., or that some type of magical rules apply to us (3rd edition D&D perhaps?) which are supposed to make us distinct from mere machines or any other macroscopic constellation of particles.

    If we could come up with a physical way to characterize intentions and such (not at all a trivial task), as opposed to only a bunch of fuzzy intuitions that are very hard to shake, then that would help to resolve some of those confusions. Then, we’d be on a level playing field, so we could be serious about trying to answer some of these questions.

    I agree with that. And I agree that I am confused.

    Related: I always felt that the Turing Test’s lock-step format is a handicap for the AI. If we observe how real humans converse, the timing is not all perfect back-and-forth. We interrupt eachother, digress, raise unrelated points, misunderstand – an AI that emulated those behaviors might be able to more easily fool a human (especially if it interrupted like Bill O’Reilly, and really annoyed the tester enough that they lost the thread).

    I’d call that a feature, not a bug. It’s not as intelligent as you or me, nor is it intelligent in the same ways. However, it fits a few basic criteria for having some kind/degree of intelligence.

    Yes, that’s why I think using IQ tests might be better than a Turing Test – even though I am not a huge fan of IQ tests. I mean, they measure some things and I am comfortable with saying that those things appear to be related to what we call “intelligence” – I think those things, taken together, may give us a better picture of if an AI is “intelligent” than whether or not it can successfully emulate an Englishman and fool an American as such. (I believe the real example was an AI pretending to be Romanian? I forget)

  12. says

    jrkrideau@#5:
    The Skinnerian will also sneer.

    (sneers) Johns Hopkins’ psych department was notorious for being a haven for “rat runners” when I was an undergrad. I found Skinner’s viewpoint to be coolly rational compared to the shovel-fuls of glarp from Maslow and Jung and Freud, so I’ve always had a sympathy with behaviorism. I can’t guess whether I was a skeptic before I was a behaviorist, or vice versa – they both seemed to come around the same time.

    jrkrideau@#6:
    Bullshit. IQ tests have minimum and maximum score values, you cannot go off the scale, at least on any test I have ever seen.

    Yes; I wasn’t quoting that because I believed it – but I think it’s an interesting thing to realize that we’re prepared to put jeopardy players on a playing field of “knowledge” and IQ tests along with AIs – even though we don’t really know where they stand. It seems to me that the end-game is when someone marks up an IQ test so it’s easier for an AI to parse and then we see how Watson or its ilk perform.

    I guess that you could get an IQ that was asymptotically off the chart if you gave the test to an AI that had booted .0001 seconds ago. Divide by zero on the age axis.

  13. says

    John Morales@#7:
    For another, consider two test subjects who both score maximally, but one of them has taken noticeably less time than the other to complete the task; are both subjects nonetheless considered to have scored equally?

    I assume we’d have to scale things by experiential years. The AI that learned to play Dota2 may have practiced for 1,000,000 years, based on its clock rate.

    That’s actually one of the things that got me thinking about AI and IQ tests: the “divide by years” as a way of capturing experience and exposure to environmental learning. I imagine that the human experience of time is governed by environment in a way that an AI’s wouldn’t need to be. Iain Banks sort of touches on that in Surface Detail as well, in the context of: “if you’re an uploaded persona existing in a digital Hell, we can give you an experience of eternity by cranking up the clock-rate of your Hell-simulation”

    Our experience of parallelism is limited; some cognitive processes in AIs might not be. Could an AI directly experience parallelism and what would that “feel” like?
    memories: uploaded.
    memories: reconciling … 10% complete

  14. chigau (違う) says

    Do any human-made flying machines approximate or mimic evolution-made flying creatures?
    What about deep under-water machines?

  15. says

    chigau@#14:
    I’m not aware of any working ornithopters, though there are some digital cockroaches and a couple walking creatures that sort of simulate evolution-made means of locomotion. I’m not aware of any swimmers, either. There are probably some – I’m just not aware of them.

Leave a Reply