Jul 25 2012

# Fun with information theory

Related to the Second Law of Thermodynamics argument against evolution discussed here yesterday is one equally intimidating for the non mathematically inclined that goes “a mutation can never increase genetic information, ergo evolution is impossible.” Like the second law deal it can sound real complicated. But surprisingly, if you’re smarter than a first-grader, the ‘no new info’ argument is super easy to falsify.

Probably 99.9% of the creationists you hear pitch this tired canard have no idea what information theory is, but then most of the people they are trying to hoodwink won’t have any idea about it either. That’s the whole point, the claim is supposed to shut you down and stand unchallenged. And if you do get into the weeds you’ll get bogge down quick in discrete mathematics and set theory over how in the world can genetic information be quantified in the first place. The good news is, it doesn’t matter.

There is such a thing as information theory. Occasionally you may encounter someone who throws out terms therein like Shannon Theory of Kolmogorov Complexity. I only encountered those subfields once, as an undergrad, in some computer math class, I think. Forgive me if this isn’t totally accurate, but I think Shannon info could be loosely summarized as how efficiently or how intact a signal can be relayed, usually over phone lines because if memory serves Claude Shannon worked for Bell Labs. Kolmogorov complexity has to do with how repetitive or compressible a data set is. I probably know more about the latter, the Mandelbrot Set pictured above is there as an example of something that is infinitely complex even though it is generated with a super-simple highly compressible rule, but I don’t know much about either.

But it doesn’t matter! Because all you have to know is what a metric space is and how a back mutation works. Both of those things are simple.

A metric set is a set where the concepts of less than and greater than and equals all make sense. You might be surprised what kind of cool, weird, funky topologies meet this benchmark, but I bet every last person reading this understands intimately what greater than and equals mean. All that matters here is every metric space has to have three properties and the very first one says A = A, i.e., a defined quantity cannot be greater than or less than itself. That means a string of genetic code cannot have less information than itself, regardless of how you are measuring genetic info, provided the measurement method makes mathematical sense.

A back mutation is exactly what it sounds like, a sequence of genetic code undergoes a change during replication — a mutation — and then the replicated code undergoes another mutation that happens to be the exact opposite of the first and restores the sequence back to what it was originally.

Now check this out: If both those mutations caused a loss of genetic information, then the re-replicated genome would have less information than its identical grandparent genome and therefore less information than itself!

Ancient philosophers recognized the form of this iron-clad proof by contradiction long ago and labeled it reductio ad absurdum. In Latin it means literally, reduction to the absurd. In this specific case, since the initial assumption produces a paradox, we can conclude that initial, creationist assumption is wrong. No matter how genetic info is measured or calculated, as long as that measurement is mathematically consistent, it’s a trivial matter to falsify the claim “All mutations cause a loss of genetic info. QED

#### 13 comments

1. 1
##### wholething

Any infidelity in a signal is a loss of Shannon information. When a star emits light, any absorption lines are a loss of Shannon information. Creationists have the idea that the world was created perfect and began to deteriorate when Adam and Eve ate the fruit. So Shannon’s definition fits their ideas.

The absortion lines gives us information about the elements in the star and their Doppler shifts tells us the relative velocity of the star to us. They can tell us about any clouds of gas between us and the star the same way.

For Creationists, mutations cause a change in the original DNA and, according to Shannon, that is a loss of information. An enzyme that becomes more general due to a mutation is a loss. An enzyme that becomes more specific is a loss of information. A duplicated gene is a loss of information in that sense. A mutation to the duplicate that makes a second enzyme that is either more or less specific is a loss of information.

Creationists apply the wrong definition of information to reality. This is not news.

2. 2
##### The Lorax

If I heard that argument, I wouldn’t attack it from the position of information theory, I’d attack it from the position of biological facts. There are numerous types of genetic mutations, and some of them add “data” in the form of base pairs to the genome. So if you have a genome with 10 base pairs (I use low numbers for less confusion), it’s possible that a genetic mutation can cause it to have 12 base pairs. And so on and so forth, up to billions. This is not a theory, or hypothesis, this is something that has actually been observed; this is a solid, inarguable fact.

So the idea that a mutation can’t increase the amount of data in a genome according to information theory is nothing but an attempt to use treknobabble to confuse people. Information theory has nothing to do with it. Biology alone says it’s wrong, case closed.

3. 3
##### philipelliott

So unless I misunderstand, you set out to falsify the statement “a mutation can never increase genetic information”, but in the end, falsify a different statement, “All mutations cause a loss of genetic info”. The two statements are not equivalent.

4. 4
##### aziraphale

I think what makes the creationist argument plausible is the feeling that a random mutation is not likely to add useful information. I like to counter this in the following way.

Suppose a simple organism has a stretch of DNA which codes for a protein that lets it digest a particular food. Then suppose these things happen:

Step 1. A copying error produces a second copy of just that stretch of DNA. The organism gets slightly better at digesting that food, so the new genome spreads through the population.

Step 2. A mutation changes one copy of that stretch of DNA so that it codes for a different protein, which lets it digest a different food. If both foods are available that’s also an advantage, so this new genome also spreads.

It’s clear that the final genome contains more useful genetic information than the one we started with. Also Step 2 doesn’t have to happen shortly after Step 1, and neither is intrinsically unlikely.

5. 5
##### Stephen "DarkSyde" Andrew

Phillip they are plenty close enough.

6. 6
##### Kevin

Of course, since a lot of genetic change which results in a change in function involves a gene duplication event, this argument is just so much nonsense anyway.

Even if the duplicated gene does nothing but what the original gene does initially, it’s still an increase in information. To be precise, a doubling of the information that was originally there.

Transpositions, translocations, and deletions are also increases in information. In fact, every change other than a complete 100% true transcription of the original increases information. And since every creature is a unique set of DNA, that’s new information.

This is why DNA testing works. You are a unique sequence of DNA. Those sequences are new information that can identify you to a certainty that will most definitely either convict you or free you in a court of law. Without new information being transmitted, there would be no way to distinguish you from any other person on the planet.

What’s happening is that creationists conflate “information” with “fitness” or “utility”. Your DNA is made up of lots of sequences that pretty much do nothing. That’s “information”, but it’s not “utility”. And “utility” doesn’t equate with “fitness” either.

The argument is worse than wrong. It’s nonsensical.

7. 7
##### Deen

@The Lorax in #2: unfortunately, you’re going to have to go into information theory a little bit deeper than just pointing out that the amount of data can increase, as the amount of data is a bad metric of the amount of information. For example, most people will agree that two copies of the same book contain twice the amount of data, but not twice the amount of information.

Note, however, that two copies still contain more information than one book, under pretty much any metric you can think of. For example, if I had to explain to you what is in the two books on my table, after telling you what is in the first book, I would still need a non-zero amount of information to tell you that the second book is identical to the first.

8. 8
##### aaronz

One thing to note, what you are calling a metric space or metric set is usually called an ordered set. A metric space is just a set with a distance function, or metric (this is in your link to wikipedia). Otherwise, nice post. As a mathematician I like seeing mathematics treated as something that can be explained and not as something verging on magic.

9. 9
##### Deen

And the same goes for Kevin in #6 – you double the data length, but not necessarily the information content. After all, the duplicate in itself doesn’t give you any new information that you didn’t already have – except for the new knowledge that there is a duplicate. Which is still a non-zero information increase. And, in genetics, can still have a non-zero biological effect (for example, a duplicate gene for a particular enzyme can result in an increased production of that enzyme).

10. 10
##### philipelliott

Stephen,
While I agree with the overall point of this post, I think your argument would be a little stronger if you hadn’t moved the goalposts ever so slightly.

11. 11
##### jaranath

philipelliott:

How so? I’m not sure I understand your argument.

I suppose part of this is due to the differing definitions of “information”, but the way I’m imagining this, your creationist argues “this point mutation is a deviation from the original stretch of DNA, therefore it’s a loss of information”. But then, we can put it back the way it was with a subsequent reversal of the mutation. Which shows the creationist’s definition of “information” is wrong or misapplied in this case. We’ve either shown information can increase by his definition, or that his definition is wrong.

I’ve been told in discussions that back mutation isn’t a refutation because it’s just restoring what was lost…that it doesn’t represent a creation of NEW information, and that the original state can’t be expanded upon. When I try getting a coherent description of the mechanism for this barrier, they usually go into full obfuscation and misdirection mode. A few have directly stated that it is, essentially, magic, which leads me to ask why were we talking about scientific laws in the first place. In pointing out clear biological examples such as the duplication mutations with advantageous variations mentioned above, I get assertions that those sorts of mutations always have greater costs in some vaguely defined elsewhere (and suddenly, information = benefit). I’ve even had one or two try to claim that these were very MINOR increases in information, so they shouldn’t count.

12. 12
##### Stephen "DarkSyde" Andrew

Thanks Aaron, I am probably biased with too much topology. I can’t remember if I elected extra topo or if they really crammed that stuff onto math majors back in my day. I took three courses in it compared to a couple of general set theory and number theory courses, and one group theory course, if memory serves.

13. 13
##### kagerato

@Deen:

Right; there is a distinction between data and information. The standard implication is that information always composes a message of some sort, whereas data could be completely meaningless. Both consist of a series of symbols, however.

As to the duplicated books example, yeah, you can argue that the information has increased just barely. This assumes that the meta-information about how many copies exist is of some kind of relevance or importance. If it’s not, then the information has not increased because the copies can be literally discarded without losing any portion of the message — in this case, the content of the book.

The part the Creationists really stumble over is how new information emerges in living things. This is mainly because they fail to correctly define information in the context. Information (the message) here is any part of the genetic code which produces a survival benefit to the species whether immediately or at any point in the future.

Several processes create random changes in the genetic code, including copying (transcription) errors, electromagnetic radiation, viruses, certain chemicals, and sexual recombination. Those changes which are harmful to the organism will tend to cause the individual to fail to reproduce or reproduce less, culling the modifications over time. Those which are neutral simply accumulate, which allows for redundant gene copies and modest structural changes to develop unchecked. Finally, those which are useful to survival in the environment will present advantages and become more widespread over generations. This is the simple overview of natural selection.

Thus, the “message” of an species’s survival capacities grows over time through the disparate selection effects driven by the environment on the population.

Note that this is a much more specific understanding of information than what is used in information theory. In the most general sense, information doesn’t need any particular purpose. It need not be man-made, and it could be completely arbitrary. Information is in practice described by its entropy, which is a measure of the uncertainty in a random variable. This means the example at #1 of light emitted by a star is actually information, even though the star cannot literally be said to be communicating with us. It’s the fact that the star’s output is not predictable and that the light it produces is measurable within the bounds of the communications channel that makes it information in the mathematical sense.

:)