Junk is what junk does

Randy Stimpson is someone a few may recall here: he was a particularly repetitious and dishonest creationist who earned himself a spot in the dungeon. One of the hallmarks of his obtuse way of ‘thinking’ is that he is a computer programmer, and so he was constantly making the category error of assuming the genome was a computer program, and therefore the product of intelligent design (never noticing that he himself is an example of how programming a computer requires relatively little intelligence). He objects to the notion of junk DNA on the Panda’s Thumb, and I just have to tear apart his nonsensical assertions there.

I don’t think we should rush to conclude that highly repetitive DNA is junk. I know it would be a mistake to think that about software. If you look at software executables (like .exe and .dll files on Windows computers) they are full of repeated sequences. You may have written a program yourself. If so, you would certainly be familiar with the concept of a subroutine or a method. At the assembly level, whenever a subroutine is called registers are pushed on the stack, when one returns they are popped of the stack. The code to push and pop registers is automatically generated by the complier and is therefore not apparent at the source code level. This translates into a massive amount of simplistic repetition at the binary level. These kinds of repetitive sequences would probably be classified as SINES by geneticists trying to understand the binary code. While this kind of code doesn’t map to any kind of a program function it is essential.

You may also know that most software developers these days work with object oriented languages where inheritance and polymorphism are used to develop hierarchies of classes. At the source code level inheritance enables developers to reuse source code without retyping it. However, when source code is compiled into binary form the result is a massive amount of repetition, but of a more sophisticated nature than that of just pushing and popping registers. These kinds of repetitive sequences would probably be classified as LINES.

I am familiar with software on far more intimate terms than most: I used to write code in assembly language, and could even read simple machine code on old 8-bit processors. I hacked together a p-code disassembler once upon a time. So yeah, I know what raw code looks like, as I’ll assume Stimpson does, as well.

I also know what DNA sequences look like. I can tell that Stimpson doesn’t have the slightest clue. No, the code to push and pop registers in a routine looks nothing like SINES, not in its distribution or in its pattern. No, standard library link codes look nothing like LINES in distribution or pattern, either. Since he mentions them, I can also explain that we know exactly what LINES and SINES do — he seems to assume that biologists must be idiots who haven’t bothered to look at the function of sequences. It’s a lovely example of projection, since it is obvious that Stimpson has never bothered to look at what these sequences are.

A LINE is a Long Interspersed Nuclear Element. Some LINEs are actually a sort of functional gene that can be transcribed and translated; they are about 6500 base pairs long and encode a couple of proteins that do something very specific: they assemble into a complex that includes a strand of their own RNA (usually), migrate into the nucleus, where they nick the DNA and insert a copy of the RNA sequence into the genome. That’s all they do, over and over. They’re a kind of self-contained Xerox machine that spews more copies of themselves, which can make more copies of themselves, which can make more copies of themselves. They are not typically associated with any of your useful genes.

How many copies do they make? Your genome contains approximately 868,000 copies of various LINE genes. Over 20% of your genome is nothing but this parasitic self-copier — it’s like spam all over the place. Don’t panic, though: this is another indicator of its status as useless junk, in that almost all of the copies are nonfunctional, either because they were sloppily inserted and are broken, or because they’ve accumulated destructive mutations (there is no harm to the reproductive capacity of the organism bearing them if a LINE acquires a stop codon), and because cells actively repress these parasites by, for instance, methylation and inactivation of stretches of DNA saturated with LINEs. Out of that huge number of copies, only 20-50 are estimated to retain any activity.

If Mr Stimpson wants to consider computer analogies, I ask: what do we call a code sequence that has only one function, the repeated duplication of copies of itself in the operating system? Do we consider that a functional and useful part of the computer, or do we try to get rid of them?

SINEs, or Short Interspersed Nuclear Elements, are even more common — your genome contains 1.6 million copies of various SINEs, taking up 13% of the genome (a lower percentage because even though there are more of them, they are shorter than LINEs). And remember, you only contain about 20,000 genes total, or about 1% of the number of SINEs. A SINE is basically a truncated LINE, or any short sequence that contains regions preferentially recognized by the LINE transcriptase, so that it is carried into the nucleus and repeatedly inserted.

That’s right. A SINE is a parasite of a parasite.

Other repetitive elements are, for example, endogenous retroviruses: relics of past viral infections. These viruses make copies of themselves into the host DNA, and in ERVs we don’t just find transcriptase enzymes — we find viral coat proteins. These are sequences that also have a known function, as sites for the synthesis of infectious disease particles. So, sure, you could say they do something — it’s just not for our benefit.

Could these repetitive sequences do anything useful? Yes, to a small degree, and we even have examples of it…unfortunately, every time someone finds a rare example of a functional piece of repetitive DNA, the ignoramuses rhapsodize about how this demonstrates it could all be useful. No, it doesn’t.

For example, one role of some junk could be in position effects. We know that if a useful gene is located next to a chunk of inactivated DNA, its expression may be downregulated to some degree — it’s a kind of spillover of a passive effect of living next to a junkyard.

Since some of these junk DNA sequences are retrotransposons that insert themselves arbitrarily into the genome, they can also be a source of mutations; some may even find portions of their sequence incorporated into the product of a functional gene. An evolutionary biologist can see this as a possibly, rarely fruitful contribute to genetic diversity, but it should give no comfort to creationists, who don’t much care for chance insertions and random variation.

There are other uses for some junk. There are structural regions of the chromosome, such as the area around the centromere, that are devoid of genes but just contain many repeats of short, untranscribed sequences. These are a kind of generic handle for proteins to glom onto, and contribute in a general way to how the chromosome works in the cell. There is also a general property of cell growth, that one of the triggers for cell division is the ratio of nuclear to cytoplasmic volume, so puffing up the genome with lots of extraneous nucleotides can lead to larger cells. Both of these functions, though, are not very sequence dependent — so sure, you could say they have a rough, general role: they are the plastic boxes and styrofoam packing peanuts of the functional elements of the genome. They may do something, but it’s not specific, and it’s not particularly dependent on the code.

Junk DNA isn’t merely stuff that we don’t understand. It’s stuff that we know something about, and know how it fits into the ecosystem of the cell, and that we call junk because we know what it does — it mainly sits up in the attic, garage, and basement, gathering dust and taking up space.

Mr Stimpson: go read a decent molecular biology and genetics book, and stop relying on your irrelevant software manuals and the dishonest and ignorant pratings of your fellow creationists.