A Computer Scientist Reads EvoPsych, Part 4

[Part 3]

The programs comprising the human mind were designed by natural selection to solve the adaptive problems regularly faced by our hunter-gatherer ancestors—problems such as finding a mate, cooperating with others, hunting, gathering, protecting children, navigating, avoiding predators, avoiding exploitation, and so on. Knowing this allows evolutionary psychologists to approach the study of the mind like an engineer. You start by carefully specifying an adaptive information processing problem; then you do a task analysis of that problem. A task analysis consists of identifying what properties a program would have to have to solve that problem well. This approach allows you to generate hypotheses about the structure of the programs that comprise the mind, which can then be tested.[1]

Let’s try this approach. My task will be to calculate the inverse square root of a number, a common one in computer graphics. The “inverse” part implies I’ll have to do a division at some point, and the “square root” implies either raising something to a power, finding the logarithm of the input, or invoking some sort of function that’ll return the square root. So I should expect a program which contains an inverse square root function to have something like:

float InverseSquareRoot( float x ) 
{

     return 1.0 / sqrt(x);

}

So you could imagine my shock if I peered into a program and found this instead:

float FastInvSqrt( float x )
{
    long i;
    float x2, y;
    
    x2 = x * 0.5;

    i = * ( long * ) &x;
    i = 0x5f3759df - ( i >> 1 );
    y = * ( float * ) &i;

    y = y * ( 1.5 - ( x2 * y * y ) );

    return y;
}

Something like that snippet was in Quake III’s software renderer. It uses one step of Newton’s Method to find the zero of an equation derived from the input value, seeded by a guess that takes advantage of the structure of floating point numbers. It also breaks every one of the predictions my analysis made, not even including a division.

The task analysis failed for a simple reason: nearly every problem has more than one approach to it. If we’re not aware of every alternative, our analysis can’t take all of them into account and we’ll probably be led astray. We’d expect convolutions to be slow for large kernels unless we were aware of the Fourier transform, we’d think it was impossible to keep concurrent operations from mucking up memory unless we knew we had hardware-level atomic operations, and if we thought of sorting purely in terms of comparing one value to another we’d miss out on the fastest sorting algorithm out there, Radix sort.

Radix sort doesn’t get implemented very often because it either requires a tonne of memory, or the overhead of doing a census makes it useless on small lists. To put that more generally, the context of execution matters more than the requirements of the task during implementation. The simplistic approach of Tooby and Cosmides does not take that into account.

We can throw them a lifeline, mind you. I formed a hypothesis about computing inverse square roots, refuted it, and now I’m wiser for it. Isn’t that still a net win for the process? Notice a key difference, though: we only became wiser because we could look at the source code. If FastInvSqrt() was instead a black box, the only way I could refute my analysis would be to propose the exact way the algorithm worked and then demonstrated it consistently predicted the outputs much better. If I didn’t know the techniques used in FastInvSqrt() were possible, I’d never be able to refute it.

On the contrary, I might falsely conclude I was right. After all, the outputs of my analysis and FastInvSqrt() are very similar, so I could easily wave away the differences as due to a buggy square root function or a flaw in the division routine. This is especially dangerous with evolutionary algorithms, as Dr. Adrian Thompson figured out in an earlier installment, because the odds of us knowing every possible trick are slim.

In sum, this analysis method is primed to generate smug over-confidence in your theories.

Each organ in the body evolved to serve a function: The intestines digest, the heart pumps blood, and the liver detoxifies poisons. The brain’s evolved function is to extract information from the environment and use that information to generate behavior and regulate physiology. Hence, the brain is not just like a computer. It is a computer—that is, a physical system that was designed to process information. Its programs were designed not by an engineer, but by natural selection, a causal process that retains and discards design features based on how well they solved adaptive problems in past environments.[1]

And is my appendix’s function to randomly attempt to kill me? The only people I’ve seen push this biological teleology are creationists who propose an intelligent designer. Few people well studied in biology would buy this line.

But getting back to my field, notice the odd dichotomy at play here: our brains are super-sophisticated computational devices, but not sophisticated enough to re-program themselves on-the-fly. Yet even the most primitive computers we’ve developed can modify the code they’re running, as they’re running it. Why isn’t that an option? Why can’t we be as much of a blank slate as forty-year old computer chips?

It’s tempting to declare that we’re more primitive than they are, computationally, but there’s a fundamental problem here: algorithms are algorithms are algorithms. If you can compute, you’re a Turing machine of some sort. There is no such thing as a “primitive” computer, at best you could argue some computers have more limitations imposed on them than others.

Human beings can compute, as anyone who’s taken a math course can attest. Ergo, we must be something like a Turing machine. Is it possible that our computation is split up into programs, which themselves change only slowly? Sure, but that’s an extra limitation imposed on our computability. It should not be assumed a-priori.

[Part 5]


[1] Tooby, John, and Leda Cosmides. “Conceptual Foundations of Evolutionary Psychology.The Handbook of Evolutionary Psychology (2005): 5-67.

A Computer Scientist Reads EvoPsych, Part 3

[Part 2]

As a result of selection acting on information-behavior relationships, the human brain is predicted to be densely packed with programs that cause intricate relationships between information and behavior, including functionally specialized learning systems, domain-specialized rules of inference, default preferences that are adjusted by experience, complex decision rules, concepts that organize our experiences and databases of knowledge, and vast databases of acquired information stored in specialized memory systems—remembered episodes from our lives, encyclopedias of plant life and animal behavior, banks of information about other people’s proclivities and preferences, and so on. All of these programs and the databases they create can be called on in different combinations to elicit a dazzling variety of behavioral responses.[1]

“Program?” “Database?” What exactly do those mean? That might seem like a strange question to hear from a computer scientist, but my training makes me acutely aware of how flexible those terms can be. [Read more…]

What is False?

John Oliver weighed in on the replication crisis, and I think he did a great job. I’d have liked a bit more on university press departments, who can write misleading press releases that journalists jump on, but he did have to simplify things for a lay audience.

It got me thinking about what “false” means, though. “True” is usually defined as “in line with reality,” so “false” should mean “not in line with reality,” the precise compliment.

But don’t think about it in terms of a single thing, but in multiple data points applied to a specific theory. Suppose we analyze that data, and find that all but a few datapoints are predicted by the hypothesis we’re testing. Does this mean the hypothesis is false, since it isn’t in line with reality in all cases, or true, because it’s more in line with reality than not? Falsification argues that it is false, and exploits that to come up with this epistemology:

  1. Gather data.
  2. Is that data predicted by the hypothesis? If so, repeat step 1.
  3. If not, replace this hypothesis with another that predicts all the data we’ve seen so far, and repeat step 1.

That’s what I had in mind when I said that frequentism works on streams of hypotheses, hopping from one “best” hypothesis to the next. The addition of time changes the original definitions slightly, so that “true” really means “in line with reality in all instances” while “false” means “in at least one instance, it is not in line with reality.”

Notice the asymmetry, though. A hypothesis has to reach a pretty high bar to be considered “true,” and “false” hypotheses range from “in line with reality, with one exception” to “never in line with reality.” Some of those “false” hypotheses are actually quite valuable to us, as John Oliver’s segment demonstrates. He never explains what “statistical significance” means, for instance, but later on uses “significance” in the “effect size” sense. This will mislead most of the audience away from the reality of the situation, and in the absolute it makes his segment “false.” Nonetheless, that segment was a net positive at getting people to understand and care for the replication crisis, so labeling it “false” is a disservice.

We need something fuzzier than the strict binary of falsification. What if we didn’t compliment “true” in the set-theory sense, but in the definitional sense? Let “true” remain “in line with reality in all instances,” but change “false” from “in at least one instance, it is not in reality” to “never in line with reality.” This creates a gap, though: that hypothesis from earlier is neither “true” nor “false,” as it isn’t true in all cases nor false in all. It must be in a third category, as part of some sort of paraconsistent logic.

This is where the Bayesian interpretation of statistics comes from, it deliberately disclaims an absolute “true” or “false” label for descriptions of the world, instead holding them up as two ends of a continuum. Every hypothesis in the third category inbetween, hoping that future data will reveal that its closer to one end of the continuum or the other.

I think it’s a neat way to view the Bayesian/Frequentism debate, as a mere disagreement over what “false” means.

A Computer Scientist Reads EvoPsych, Part 2

[Part 1]

the concept of “learning” within the Standard Social Science Model itself tacitly invokes unbounded rationality, in that learning is the tendency of the general-purpose, equipotential mind to grow—by an unspecified and undiscovered computational means—whatever functional information-processing abilities it needs to serve its purposes, given time and experience in the task environment.

Evolutionary psychologists depart from fitness teleologists, traditional economists (but not neuroeconomists), and blank-slate learning theorists by arguing that neither human engineers nor evolution can build a computational device that exhibits these forms of unbounded rationality, because such architectures are impossible, even in principle (for arguments, see Cosmides & Tooby, 1987; Symons 1989, 1992; Tooby & Cosmides, 1990a, 1992).[1]

Yeah, these people don’t know much about computer science.

You can divide the field of “artificial” intelligence into two basic approaches. The top-down approach outlined modular code routines like “recognize faces,” then broke those down into sub-tasks like “look for eyes” and “find mouths.” By starting at a high level and dividing these things down into neat, tidy sub-programs, we can chain them together and create a greater whole.

It’s never worked all that well, at least for real-life problems. Take Cyc, the best example I can think of. It takes basic facts about the world, like “water is wet” or “rain is water,” and uses a simple set of rules to query these facts (“is rain wet?”). What it can’t do is make guesses (“are clouds wet?”), nor discover new facts on its own, nor handle anything but simple text. Thirty years and millions of dollars haven’t made a dent in those problems.

Meanwhile, the graphics card manufacturer NVidia is betting the farm on something called “deep learning,” one of several “bottom-up” approaches. You present the algorithm with an image (or sound file or object, the number of dimensions can be easily changed), and it maps it to a grid of cells. You toss a slightly smaller grid of cells on top of it, and for each new cell you calculate a weighted sum of the nearby values in the previous grid, weights that are random to start off with. Repeat this several times, and you’ll wind up with a single cell at the end. Assign this cell to an output, say “person,” then rewind all the way back to the start. Wash, rinse, and repeat until you get another single cell, then at least enough single cells to handle every possible solution. All of these single cells have a value associated with them, so that “person” cell might give the image 0.7 “person”s. Having cataloged what’s in the image already, you know there’s actually 1.0 “person” there, and so you propagate that information back down the chain. Prior cell weights which were pro-person are increased, while the anti-person ones are decreased. Do this right to the bottom, and for every input cell, then repeat the process for a new image.

It’s loosely patterned after how our own neurons are laid out. Biology is a bit more liberal with how it connects, but this structure has the virtue of being easy to calculate and massively parallel, quite convenient for a company which manufactures processors that specialize in massively parallel computations. NVidia’s farm-betting comes from the fact that it’s wildly successful; all of the best image recognition algorithms follow the deep-learning pattern, and their success rates are not only impressive but also resemble our own.[2]

Heard of the AI that could play Atari games? Emphasis mine:

Our [Deep action-value Network or DQN] method outperforms the best existing reinforcement learning methods on 43 games without incorporating any of the additional prior knowledge about Atari 2600 games used by other approaches … . Furthermore, our DQN agent performed at a level that was comparable to that of a professional human games tester across the set of 49 games, achieving more than 75% of the human score on more than half of the games […]

Indeed, in certain games DQN is able to discover a relatively long-term strategy (for example, Breakout: the agent learns the optimal strategy, which is to first dig a tunnel around the side of the wall allowing the ball to be sent around the back to destroy a large number of blocks; …). […]

In this work, we demonstrate that a single architecture can successfully learn control policies in a range different environments with only very minimal prior knowledge, receiving only the pixels and the game score as inputs, and using the same algorithm, network architecture and hyperparameters each game, privy only to the inputs a human player would have.[3]

This deep learning network has no idea what a video game is, nor is it permitted to peek at the innards of the game itself, yet can not only learn to play these games at the same level as human beings, it can develop non-trivial solutions to them. You can’t get more “blank slate” than that.

This basic pattern has repeated multiple times over the decades. Neural nets aren’t as zippy as the new kid on the “bottom-up” block, yet they too have had great success where the modular top-down approach has failed miserably. I haven’t worked with either technology, but I’ve worked with something that’s related: genetic algorithms. Represent your solutions in a sort of genome, come up with a fitness metric for them, then mutate or randomly construct those genomes and keep the fittest ones in the pool until you’ve tried every possibility, or you get bored. Two separate runs might converge to the same solution, or they might not. A lot depends on the “fitness landscape” they occupy, which you can visualize as a 3D terrain map with height representing how “fit” something is.

A visualization of three "evolutionary fitness landscapes," ranging from simple to complex to SUPER complex.That landscape has probably got more than three dimensions, but those aren’t as easy to visualize and they behave very similarily to the 3D case. The terrain might be a Mount Fiji with a single solution at the top of a fitness peak, or a Himalayas with many peak solutions scattered about but a single tallest standing above them, or a foothills where solutions are aplenty but the best solution is tough to find.

All of these take the “bottom-up” approach, the opposite of the “top-down” one, and work up from very small components towards a high-level goal. The path to there is rarely known in advance, so the system “feels” its way there via evolutionary algorithms.

That path may not go the way you expect, however. Take the case of a researcher, Dr. Adrian Thompson, who used an evolutionary algorithm to find the smallest computer processor that could sense the difference between two tones.

Finally, after just over 4,000 generations, the test system settled upon the best program. When Dr. Thompson played the 1kHz tone, the microchip unfailingly reacted by decreasing its power output to zero volts. When he played the 10kHz tone, the output jumped up to five volts. He pushed the chip even farther by requiring it to react to vocal “stop” and “go” commands, a task it met with a few hundred more generations of evolution. As predicted, the principle of natural selection could successfully produce specialized circuits using a fraction of the resources a human would have required. And no one had the foggiest notion how it worked.

Dr. Thompson peered inside his perfect offspring to gain insight into its methods, but what he found inside was baffling. The plucky chip was utilizing only thirty-seven of its one hundred logic gates, and most of them were arranged in a curious collection of feedback loops. Five individual logic cells were functionally disconnected from the rest— with no pathways that would allow them to influence the output— yet when the researcher disabled any one of them the chip lost its ability to discriminate the tones. Furthermore, the final program did not work reliably when it was loaded onto other FPGAs of the same type.

It seems that evolution had not merely selected the best code for the task, it had also advocated those programs which took advantage of the electromagnetic quirks of that specific microchip environment. The five separate logic cells were clearly crucial to the chip’s operation, but they were interacting with the main circuitry through some unorthodox method— most likely via the subtle magnetic fields that are created when electrons flow through circuitry, an effect known as magnetic flux. There was also evidence that the circuit was not relying solely on the transistors’ absolute ON and OFF positions like a typical chip; it was capitalizing upon analogue shades of gray along with the digital black and white.[4]

Evolutionary approaches are very simple and require no understanding or insight into the problem you’re solving, but they usually requires ridiculous amounts of computation or training merely to keep pace with the top-down “modular” approach. The fitness function may lead to a solution much too complicated for you to understand or much too fragile to operate anywhere but where it was generated. But the bottom-up approach may be your only choice for certain problems.

The moral of the story: the ability to do complex calculation can be built up from a blank slate, in principle and practice. When we follow the bottom-up approach we tend to get results that more closely mirror biology than when we work from the top-down and modularize, though this is less insightful than it first appears. Nearly all bottom-up approaches take direct inspiration from biology, whereas top-down approaches owe more to Plato then Aristotle.

Biology prefers the blank slate.

[Part 3]


[1] Tooby, John, and Leda Cosmides. “Conceptual Foundations of Evolutionary Psychology.The Handbook of Evolutionary Psychology (2005): 5-67.

[2] Kheradpisheh, Saeed Reza, et al. “Deep Networks Resemble Human Feed-forward Vision in Invariant Object Recognition.” arXiv preprint arXiv:1508.03929 (2015).

[3] Mnih, Volodymyr, et al. “Human-level control through deep reinforcement learning.” Nature 518.7540 (2015): 529-533.

[4] Bellows, Alan. “On the Origin of Circuits • Damn Interesting.” Accessed May 4, 2016.

A Computer Scientist Reads EvoPsych, Part 1

Computer Science is weird. Most of the papers published in my field look like this:

We describe the architecture of a novel system for precomputing sparse directional occlusion caches. These caches are used for accelerating a fast cinematic lighting pipeline that works in the spherical harmonics domain. The system was used as a primary lighting technology in the movie Avatar, and is able to efficiently handle massive scenes of unprecedented complexity through the use of a flexible, stream-based geometry processing architecture, a novel out-of-core algorithm for creating efficient ray tracing acceleration structures, and a novel out-of-core GPU ray tracing algorithm for the computation of directional occlusion and spherical integrals at arbitrary points.[1]

A speed improvement of two orders of magnitude is pretty sweet, but this paper really isn’t about computers per se; it’s about applying existing concepts in computer graphics in a novel combination to solve a practical problem. Most papers are all about the application of computers, and not computing itself. You can dig up examples of the latter, like if you try searching for concurrency theory,[2] but even then you’ll run across a lot of applied work, like articles on sorting algorithms designed for graphics cards.[3]

In sum, computer scientists spend most of their time working in other people’s fields, solving other people’s problems. So you can imagine my joy when I stumbled on people in other fields invoking computer science.

Because the evolved function of a psychological mechanism is computational—to regulate behavior and the body adaptively in response to informational inputs—such a model consists of a description of the functional circuit logic or information processing architecture of a mechanism (Cosmides & Tooby, 1987; Tooby & Cosmides, 1992). Eventually, these models should include the neural, developmental, and genetic bases of these mechanisms, and encompass the designs of other species as well.[4]

Hot diggity! How well does that non-computer-scientist understand the field, though? Let’s put my degree to work.

The second building block of evolutionary psychology was the rise of the computational sciences and the recognition of the true character of mental phenomena. Boole (1848) and Frege (1879) formalized logic in such a way that it became possible to see how logical operations could be carried out mechanically, automatically, and hence through purely physical causation, without the need for an animate interpretive intelligence to carry out the steps. This raised the irresistible theoretical possibility that not only logic but other mental phenomena such as goals and learning also consisted of formal relationships embodied nonvitalistically in physical processes (Weiner, 1948). With the rise of information theory, the development of the first computers, and advances in neuroscience, it became widely understood that mental events consisted of transformations of structured informational relationships embodied as aspects of organized physical systems in the brain. This spreading appreciation constituted the cognitive revolution. The mental world was no longer a mysterious, indefinable realm, but locatable in the physical world in terms of precisely describable, highly organized causal relations.[4]

Yes! I’m right with you here. One of the more remarkable findings of computer science is that every computation or algorithm can be executed on a Turing machine. That includes all of Quantum Field Theory, even those Quant-y bits. While QFT isn’t a complete theory, we’re extremely confident in the subset we need to simulate neural activity. Those simulations have long since been run and matched against real-world data, the current problem is scaling up from faking a million neurons at a time to faking 100 billion, about as many as you have locked in your skull.

Our brains can be perfectly simulated by a computational device, and our brain’s ability to do math proves they are computational. I can quibble a bit on the wording (“precisely describable” suggests we’ve faked those 100 billion), but we’re off to a great start here.

After all, if the human mind consists primarily of a general capacity to learn, then the particulars of the ancestral hunter-gatherer world and our prehuman history as Miocene apes left no interesting imprint on our design. In contrast, if our minds are collections of mechanisms designed to solve the adaptive problems posed by the ancestral world, then hunter-gatherer studies and primatology become indispensable sources of knowledge about modern human nature.[4]

Wait, what happened to the whole “our brains are computers” thing? Look, here’s a diagram of a Turing machine.

An annotated diagram of a Turing machine. Copyright HJ Hornbeck 2016, CC-BY-SA 4.0.

A read/write head sits somewhere along an infinite ribbon of tape. It reads what’s under the head, writes back a value to that location, and moves either left or right, all based on that value. How does it know what to do? Sometimes that’s hard-wired into the machine, but more commonly it reads instructions right off the tape. These machines don’t ship with much of anything, just the bare minimum necessary to do every possible computation.

This carries over into physical computers as well; when the CPU of your computer boots up, it does the following:

  1. Read the instruction at memory location 4,294,967,280.
  2. Execute it.

Your CPU does have “programs” of a sort, instructions such as ADD (addition) or MULT (multiply), but removing them doesn’t remove its ability to compute. All of those extras can be duplicated by grouping together simpler operations, they’re only there to make programmers’ lives easier.

There’s no programmer for the human brain, though. Despite what The Matrix told you, no-one can fiddle around with your microcode and add new programs. There is no need for helper instructions. So if human brains are like computers, and computers are blank slates for the most part, we have a decent reason to think humans are blank slates too, infinitely flexible and fungible.

[Part 2]


[1] Pantaleoni, Jacopo, et al. “PantaRay: fast ray-traced occlusion caching of massive scenes.” ACM Transactions on Graphics (TOG). Vol. 29. No. 4. ACM, 2010.

[2] Roscoe, Bill. “The theory and practice of concurrency.” (1998).

[3] Ye, Xiaochun, et al. “High performance comparison-based sorting algorithm on many-core GPUs.” Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 2010.

[4] Tooby, John, and Leda Cosmides. “Conceptual Foundations of Evolutionary Psychology.The Handbook of Evolutionary Psychology (2005): 5-67.