We’ve often heard this claim from creationists: “there is no way for genetics to cause an increase in complexity without a designer!”. A recent example has been Michael Egnor’s obtuse caterwauling about it. We, including myself, usually respond in the same way: of course it can. And then we list examples of observations that support the obviously true conclusion that you can get increases in genetic information over time: we talk about gene duplication, gene families, pseudogenes, etc., all well-documented manifestations of natural processes that increase the genetic content of the organism. It happens, it’s clear and simple, get over it, creationists.
Maybe we’ve been missing the point all along, though. The premise of that question from the creationists is what they consider a self-evident fact: that evolution posits a steady increase in complexity from bacteria to Homo sapiens, the deep-rooted idea of the scala natura, a ladder of complexity from simple to complex. Their argument is that the ladder cannot be climbed, and our response is usually, “sure it can, watch!” when perhaps a better answer, one that is even more damaging to their ideology, is that there is no ladder to climb.
That’s a tougher answer to explain, though, and what makes it even more difficult is that there is a long scientific tradition of pretending the ladder is there. Larry Moran has an excellent article on this problem (Alex has a different perspective), and I want to expand on it a little more.
Larry points out that there’s an awful lot of effort being put into coming up with excuses for an observation, that the number of human genes isn’t that much greater than the number of genes in a fly. He has usefully categorized seven different kinds of rationalizations that people use to try to inflate the complexity of our genome, in the hopes of somehow reassuring themselves that they can explain our much, much greater complexity than a fly or a nematode or a mouse.
- Alternative Splicing: We may not have many more genes than a fruit fly but our genes can be rearranged in many different ways and this accounts for why we are much more complex. We have only 25,000 genes but through the magic of alternative splicing we can make 100,000 different proteins. That makes us almost ten times more complex than a fruit fly. (Assuming they don’t do alternative splicing.)
- Small RNAs: Scientists have miscalculated the number of genes by focusing only on protein encoding genes. Our genome actually contains tens of thousands of genes for small regulatory RNAs. These small RNA molecules combine in very complex ways to control the expression of the more traditional genes. This extra layer of complexity, not found in simple organisms, is what explains the Deflated Ego Problem.
- Pseudogenes: The human genome contains thousands of apparently inactive genes called pseudogenes. Many of these genes are not extinct genes, as is commonly believed. Instead, they are genes-in-waiting. The complexity of humans is explained by invoking ways of tapping into this reserve to create new genes very quickly.
- Transposons: The human genome is full of transposons but most scientists ignore them and don’t count them in the number of genes. However, transposons are constantly jumping around in the genome and when they land next to a gene they can change it or cause it to be expressed differently. This vast pool of transposons makes our genome much more complicated than that of the simple species. This genome complexity is what’s responsible for making humans more complex.
- Regulatory Sequences: The human genome is huge compared to those of the simple species. All this extra DNA is due to increases in the number of regulatory sequences that control gene expression. We don’t have many more protein-encoding regions but we have a much more complex system of regulating the expression of proteins. Thus, the fact that we are more complex than a fruit fly is not due to more genes but to more complex systems of regulation.
- The Unspecified Anti-Junk Argument: We don’t know exactly how to explain the Deflated Ego Problem but it must have something to do with so-called “junk” DNA. There’s more and more evidence that junk DNA has a function. It’s almost certain that there’s something hidden in the extra-genic DNA that will explain our complexity. We’ll find it eventually.
- Post-translational Modification: Proteins can be expensively modified in various was after they are synthesized. The modifications, such as phosphorylation, glycosylation, editing, etc., give rise to variants with different functions. In this way, the 25,000 primary protein products can actually be modified to make a set of enzymes with several hundred thousand different functions. That explains why we are so much more complicated than worms even though we have similar numbers of genes.
It’s important to note that these are not bogus phenomena—there is alternative splicing and small RNAs are important and regulatory sequence is immensely important (especially if you ask a developmental biologist), and so forth. He’s not trying to make these explanations go away (although maybe the junk DNA excuse is feeble enough that it should be ignored) — the point is that these mechanisms do not address the question. These phenomena are going on in all of those so-called “simpler” organisms, too—it’s not as if the hominid lineage was blessed with alternative splicing enzymes and miRNAs and beetles were damned to trifling insignificance without them. It’s certainly not as if insects lack the sophisticated mechanisms of gene regulation found in mammals.
Larry’s answer is the same as mine, and it cuts right through the whole mess of tangled explanations: these are non-answers to a non-problem. The problem goes away if you stop assuming that people are more complicated than mice. With some exceptions, you’re asking the wrong questions if you’re talking about complexity.
I like to compare humans with chimpanzees—are we more complex? I don’t think there is any way to say that we are. We have about the same number of genes and a genome that is about the same size and that is organized in a roughly similar way. There is some speculation about the differences in our genomes that lead to the obvious differences in our morphology, but they don’t postulate any increases in complexity. There are, for instances, genes like ASPM that are found in both species (and in flies!) and are known to act as regulators of mitotic activity. If the form of ASPM in humans has a few nucleotide changes in it that allows certain regions of the brain to engage in a few more divisions than occurs in a chimp, it increases the size of that region, but is that an increase in complexity? If other genes have changes in their regulatory regions (I told you those are important) that switch them on and off at different times in development, allowing for timing shifts that emphasize the growth of certain structures over others, is that an increase in complexity?
I would say no. Those are differences, not increases. With a genome of tens of thousands of genes, there is the potential for a colossal amount of diversity, and what’s going on in metazoan clades is not really an addition of new information, but an exploration of the potential morphogenetic space. Think of 10,000 genes as really representing 10,000 dimensions, with individual species occupying small compact clouds in that immense space, flies off in one corner, us in another, with the chimpanzee cloud somewhere nearby. It’s a misrepresentation of the problem to try and argue that we humans need a significantly greater number of bits to define our position than do flies or chimpanzees.
One thing that has long bugged me is that there’s a fair amount of argument in the scientific literature that presupposes a trend of increasing informational complexity in metazoan lineages — Larry is pointing out some current issues in modern genomics that are afflicted with this assumption, but this is actually an issue that has been simmering in my head for some time now, and I’ve written on it before.
Long before. Before the blog. Back in the ancient days of usenet. On a dusty parchment scroll archived in the great data banks of Google, there is a post from January of 2000 with an argument I will now reconstruct on this vastly more modern medium.
Here’s where I begin. Way back even before usenet, when I was a lowly undergraduate, I read JT Bonner’s On Development, which was one of those dangerous books that lead one down whole new intellectual paths. He has a long discussion on complexity in one of the early chapters, which revealed that this is one of those really hard problems. There are so many different ways to measure complexity that one ends up just throwing up one’s hands and declaring that everything in biology is complex, which is just fine. That chart on the right is one of many ways he tried to look at biological complexity. It plots organism volume and total cell number against a rough estimate of the number of cell types, and concludes that they’re roughly correlated. If you have lots and lots of cells, you have greater potential and greater need for a division of labor, so maybe that’s all that is going on here. It’s a sensible answer, and Bonner’s treatment of this problem is among the best I’ve read since, but even at my tender and relatively unschooled age I could see some problems. A human is significantly smaller than a sequoia, for instance, but I suspected we’d have approximately the same number of cell types as a whale, which would make a mess of the upper end of that graph. And how do you count cell types, anyway? That ordinal axis is awfully fuzzily defined, and there seems to be some selective presentation of the data to just those instances that make a nice line.
Much later, I read Stuart Kauffman’s At Home in the Universe. It’s an excellent and thought-provoking book, but the attempt to measure complexity in number of cell types reared it’s bleary-eyed head again. Here’s the worst piece of data in the whole book. It’s comparing ‘measurements’ from biology with the results of some interesting computer simulations he was doing. He had models where he’d vary the number of simulated interacting genes, and observe that the simulations would all converge on a much smaller number of basins of attraction, so he’s equating number of model genes with the quantity of DNA in a cell, and the number of attractors to the number of cell types, and showing that both simulation and biology show roughly similar trends.
This is a horrible graph with many problems that completely invalidate its utility. Number of genes and quantity of DNA are not the same thing, especially when you note that “bacteria” (there is only one kind, of course, I cynically suggest) with negligible junk DNA and humans with huge amounts of junk are plotted on that same axis. The data are also selective: where are the ferns and amphibians, which have so much DNA that they would be placed on the far right of the scale? And then there is the mysterious “number of cell types” parameter, the source of which is not explained. We are told that man has 256 cell types and flies have 60, though, which struck me as unlikely and low, especially when they are being equated with gene regulatory states. Mainly, it seemed like a parameter with values selected to put humans at the high end and bacteria at the low.
The text of the book does itself no favors when it announces the accuracy of the simulation by pointing out that it returns approximately the same number of cell types — 317 — as are found in humans — 256, which I find dubious — when it uses the same number of genes as are found in humans — 100,000. Ooops. The book was written in 1995, but still, there wasn’t good reason to think we had that many genes even then. It’s also strange that the graph shows the point for Man corresponding to about 700,000 model genes. When no hard numbers are to be had, the values for your analysis seem to become infinitely malleable.
Anyway, I finished the book thinking Kauffman had some interesting ideas, but his grounding in biology was shaky to say the least.
Then Glenn Morton led me to a paper by Valentine and others. For those who don’t know of him, Valentine is a brilliant evolutionary biologist, and I will freely admit that he’s much, much smarter than I am, but this paper was baloney. It specifically tries to assemble a picture of increasing complexity of metazoans over evolutionary history using cell type number as a measure of complexity. At the very least, though, it gives a citation for the source of all of those various estimates of cell type number: a paper on taxonomic biochemistry from 1964, by Sneath. I am sad to say that I have not been able to track down a copy of this fairly obscure paper from a very specialized book (I have received synopses from people who have), but there is a hint of the methodology in the Valentine paper: Sneath was plotting “estimates of cell type number against measures of DNA content of the haploid genomes of a range of organisms including metazoan phyla”, and he concluded that they were proportional. Right there, we know that this is both crude and wrong; there is no correlation of complexity and DNA content (unless we’re prepared to admit that frogs and ferns are the most complex organisms of all time). The Valentine paper briefly acknowledges this problem, then ignores it, to produce this chart:
Glenn Morton also extracted estimated numbers from the chart, if you prefer your data tabular.
|# cell types||age (mya)|
It’s the march of rising complexity from sponges to man! It’s a beautiful example of garbage in, garbage out.
Almost all of the numbers for cell types are taken from Sneath, and are listed in an appendix at the end. Most of the exceptions are at the low end of the scale, the porifera and cnidaria, where cell type numbers are enumerable, and for the hominidae, which is taken from a molecular biology textbook (and I have my doubts about their quality, too). Sneath’s numbers are flagged with an asterisk, to indicate that they are “estimates not documented by lists of cell types or by references to published histological descriptions”—in other words, they’re unverifiable guesses, and the Sneath paper is the dead end in a search for the source of the estimates. I hate to say it, but these are bad data. Almost every point’s position on the Y-axis is noise — noise colored by an assumption that, for instance, a ray-finned fish must be simpler in organization than a bird. And I don’t trust it a bit.
Another peculiarity is the choice of hominids as the terminal point. I don’t know of any evidence that we have more cell types than a horse or a kangaroo, and since these are all rough estimates anyway, it would have been more appropriate to use a number for “mammals”—which would have pushed that data point a hundred million years to the left. There’s wobble in both the X and Y coordinates of each data point, which makes that line even more worrisome. This paper is an exercise in Markovian modeling of how random variance in cell type number could lead to a progressive pattern in the upper bound of extant cell type number, and really, it’s a kind of attempt at curve fitting to those fuzzy points. They come to the conclusion that about one new cell type emerges every 3 million years, and that extrapolating backward, the metazoa arose about 600 million years ago. They also want to argue that the Cambrian “explosion” did not involve any significant increase in complexity, contra Gould’s then-current thesis that would have predicted a sharper step function, rather than that steadily rising curve. Unfortunately, given the unassessed variability present in the data, I don’t see how they can decide that curve could be any better than just about any other.
Other serious problems in the data are that 1) it ignores life history information, and 2) consciously does not consider complexity in the nervous system! The first means that many of the invertebrate phyla, with their more elaborate temporal segregation of morphology, are going to have their cell types more seriously undercounted than are the vertebrates, and most of the groups before the 500 million year mark are going to have their complexity underestimated. The second problem is particularly serious for a paper that is trying to argue that there is no step function in the emergence of complexity—they are throwing out the most complex tissue type in the data. Even if their curve is real, they’re neglecting a large pedestal of complexity all across the board.
That’s a huge mistake. I’m fairly familiar with the insect neurodevelopment literature, so when I saw papers saying arthropods only have 50-60 cell types, alarm bells started ringing. My postdoc was spent staring at the nervous system of early embryonic grasshoppers, and I knew well all the membranes, the hemocytes, the tracheal epithelia, the various kinds of glia, the specialized midline tissues, the cuticle, the forming gut, the peripheral sensillae and supporting cells, muscles, fat, various glands, etc., without even considering the neurons…and with the neurons, oh boy. Here’s a diagram from the work of Chris Doe of a single side of a single developing ganglion, showing just the neuroblasts; not the neurons that will eventually arise from them, but just the progenitor cells that lie on the floor of the CNS. If you look at just the latest stage at the bottom right, there are about 30 different neuroblasts there, each with a unique identity. They all look pretty much the same, and I suspect that someone without much detailed knowledge would just say they’re all one neuroblast cell type, but the insect knows otherwise. The color-coding is used to indicate the combinatorial expression of a large subset of known molecules with differential expression. They may look the same, but at the level of what the genes are doing, they all have their own unique pattern.
I’m also familiar with some embryonic vertebrate nervous systems, and I can say that they tend to have many more cells in them — but they don’t seem to be as precisely identified at the single cell level as the invertebrate CNS. We have large populations of cells with similar patterns of molecular specification, rather than this kind of precise, cell-by-cell programmatic identity.
Now, from a genetic perspective, which pattern is more complex? I don’t know. They’re both complex but in very different ways—it’s basically impossible at this point to even identify a quantifiable metric that would tell us how complex either of these kinds of systems are. How many cell types are present in this whole animal? I don’t know that either. I’d want to look at a whole constellation of markers for their genetic regulatory state, a huge task that I haven’t seen done by anyone yet, and that would have been impossible in 1964, when Sneath made his estimates. I bet it’s many more than 60, though.
I’ll go out on a limb and make a prediction: any difference in the degree of complexity, assuming an objective method of measurement, in the triploblastic metazoa will much be less than an order of magnitude, and that the vertebrates will all be roughly equivalent…and that if any group within the vertebrates shows a significant increase in genetic complexity above the others, it will be the teleosts. I’ll also predict that any ‘extra’ complexity in members of these groups will not be a significant factor in their fitness, although it might contribute to evolvability.
The idea that complexity is a material and significant element in the genome, one that has a pattern of increase that has reached its pinnacle in humanity, is little more than one of the last vestiges of the mistaken notion of progress in evolution, and one that seems to be supported only by largely imaginary evidence. In particular, the often expressed idea that people, of all creatures, must be especially complex is like hearing someone with no knowledge of pianos explain that their favorite piano sonata is so wonderfully beautiful that it must have been played on an instrument with many more than 88 keys—and that Jerry Lee Lewis and Beethoven couldn’t possibly have been composing on similar instruments.
Sneath PHA (1964) Comparative biochemical genetics in bacterial taxonomy. in Taxonomic biochemistry and serology, Leone CA, ed. Ronald, New York.
Valentine JW, Collins AG, Meyer CP (1994) Morphological Complexity Increase in Metazoans. Paleobiology 20(2):131-142.