ENCODE has its defenders!


You know I was really pissed off at the crap ENCODE was promoting, that the genome was at least 80% functional and that there was no such thing as junk DNA. And there have been a number of better qualified scientists (like W. Ford Doolittle and Dan Graur and many others) who have stood up and registered their vehement disagreement with that nonsense. But there are some who agree that the genome must be largely functional, like John Mattick. Larry Moran reminds me that Mattick is the author of this infamous chart, however, which is best known as the original Dog’s Ass Plot.

Worst evolution diagram ever

That is so misleadingly dishonest it takes my breath away — Mattick cherry-picked genome sizes to fit his curve. One of my cell biology labs involves teaching students how to properly construct a simple graph, and I think I’m going to include this figure as a bad example.

Well, Mattick has done it again. He has published a paper (how do these things get through peer review?) disputing the existence of large quantities of non-functional DNA, which is largely an attempted rebuttal of Graur’s paper. It’s a short paper, but painful in its contortions and extraordinarily poor arguments. Larry Moran has done an excellent job of tearing it apart — I think he needs to polish it up and get that published.

The worst part of the paper, though, is the concluding paragraph — you know, where most of us try to put the most important message of the work.

There may also be another factor motivating the Graur et al. and related articles (van Bakel et al. 2010; Scanlan 2012), which is suggested by the sources and selection of quotations used at the beginning of the article, as well as in the use of the phrase “evolution-free gospel” in its title (Graur et al. 2013): the argument of a largely non-functional genome is invoked by some evolutionary theorists in the debate against the proposition of intelligent design of life on earth, particularly with respect to the origin of humanity. In essence, the argument posits that the presence of non-protein-coding or so-called ‘junk DNA’ that comprises >90% of the human genome is evidence for the accumulation of evolutionary debris by blind Darwinian evolution, and argues against intelligent design, as an intelligent designer would presumably not fill the human genetic instruction set with meaningless information (Dawkins 1986; Collins 2006). This argument is threatened in the face of growing functional indices of noncoding regions of the genome, with the latter reciprocally used in support of the notion of intelligent design and to challenge the conception that natural selection accounts for the existence of complex organisms (Behe 2003; Wells 2011).

I’m sure the Discovery Institute staff are dancing in pirouettes of joy at getting a neutral or possibly favorable mention in a legitimate journal. It’s not clear exactly what Mattick is trying to do here (lack of clarity is also a sin in science writing, let me remind you): either he’s trying to pre-emptively slander his critics by impugning them with an ideological motive, or he’s granting credence to Intelligent Design creationism. I’m inclined to think it’s both; he’s clearly trying to argue with the motives of Graur and others, but also, he’s claiming, as the creationists do, that evidence of function for the highly variable component of our genome is a de facto argument for a purpose for that variation, and that evolutionary theory does not support the idea of a functional purpose for variation in the sequence of most satellite DNA, for instance.

But I would not argue that ubiquitious functionality is unlikely because it has consequences for our theories; it’s wrong because of all the evidence that has been marshaled that most DNA is not there to serve a specific, selectable purpose for us humans.

Comments

  1. Amphiox says

    When it comes to ENCODE’s definition of “functional” sequence, a fundamental bit of science, aka, a control null hypothesis, was missing. And now has been done:

    http://thefinchandpea.com/2013/07/17/using-a-null-hypothesis-to-find-function-in-the-genome/

    The majority of wholly random artificially generated DNA sequences had regulatory “function” per the ENCODE definition of the term. (Reproducible regulator effects on a reporter gene).

    Aside from utterly demolishing the ENCODE concept of “functional”, this may also have evolutionary significance. It may mean that a majority of random mutations in noncoding DNA will have some kind of biochemical activity that can potentially have reproducible regulatory effects on nearby genes. Such regulator effects should be potentially visible to natural selection, and would make the evolution of DNA regulation that much easier.

  2. Matt G says

    If much or most of the genome is non-functional, that’s bad for ID claims. In the unlikely event that much IS shown, it still doesn’t help them because they still have no testable claims. They are trying to portray teleology as science.

  3. says

    I guess I need more clarity on this.

    It is my understanding that only a small percentage of human DNA codes for proteins. However, there are stretches of non-coding DNA that are extremely conserved and appear more stable than even coding stretches. Presumably, these are critically functional, to an extent where pretty much any mutation makes them — and thus the whole organism — non-viable. I remember reading that some have been identified as having regulatory functionality, seeming to turn genes on and off through an as-yet unidentified means rather than by using protein.

    There does certainly appear to be a lot of junk DNA — retro-viral remnants, sections made obsolete by mutations elsewhere, traits that went dormant long ago and started to “drift”, that sort of thing — so I have no trouble dismissing Mattick’s conclusions. But it seems to me that non-coding is not the same as non-functional.

    I would be interested in learning more about this, if anyone has links good for a reasonably well-educated non-biologist.

  4. alwayscurious says

    How appalling! The dog’s weight slants the top of its bar, but the single cell organisms shouldn’t be able to slant the graph to the same extent! Clearly there needs eukaryote’s bar needs to be a U shape to accomodate the roundness of the cell and the prokaryote should be resting on its side on a flat or gently sloped bar.

  5. johnharshman says

    Gregory #4:

    But it seems to me that non-coding is not the same as non-functional.

    You are quite correct. But oddly, the only people who make this mistaken equation are the science journalists and ENCODE fans who want to erect a convenient strawman to argue against. We have known for at least 50 years that at least some non-coding DNA has regulatory functions. I’m thinking that upstream promoters were the first to be discovered. To repeat: the only people who ever equated non-coding DNA with junk were the people who wanted to present their discovery of functional non-coding DNA (true or not) as revolutionary and important. Everyone else knows that there is functional non-coding DNA, but it adds up to only a few percent of all the non-coding DNA. Check out all the evidence that Mattick dismisses in his paper: neutral evolutionary rate, genome size variations among close relatives, mutational load.

  6. dukeofomnium says

    On that chart, why the the fungi/plants icon look like a frog with an umbrella? And why would a frog want an umbrella, anyhow? They spend a lot of their time in water, so they wouldn’t mind getting wet. And that eukaryote looks like a single sunny-side-up egg. Shouldn’t it be called a eukaryolk?

    These are the questions that intelligent design can answer. What does evolution have to offer? Hmmmmm?

  7. Amphiox says

    However, there are stretches of non-coding DNA that are extremely conserved and appear more stable than even coding stretches. Presumably, these are critically functional, to an extent where pretty much any mutation makes them — and thus the whole organism — non-viable. I remember reading that some have been identified as having regulatory functionality, seeming to turn genes on and off through an as-yet unidentified means rather than by using protein.

    The gene coding portion of the human genome is about 1% of the total. The known regulatory sequences constitute about 5% of the total.

    Over 50% of the human genome is already known to consist of retroviral remnants, LINES and SINES, and pseudogenes. This portion is usually included in the “junk” when talking about junk DNA.

    However, per the ENCODE definitions of functional, many, if not all, of these kinds of sequences would turn up positive for functional activity. Retroviral remnants may still possess the sequences needed to bind at least some transcription proteins, which would have been necessary for active retroviral replication. Same with pseudogenes. Some common “breaking” mutation that turns a gene into a pseudogene are early stop codons and frameshift mutations. Neither of these prevent the pseudogene from actually be transcribed into a “gibberish” mRNA which may even be translated into a “gibberish” protein. Both of these activities would have turned up as a positive “function” per the ENCODE definition of functionality. Similarly, LINES and SINES jump around the genome and replicate themselves selfishly. Both of these activities require the binding of either transcription proteins or proof-reading proteins. Even decayed sequences may still possess some affinity and bind such proteins at least partially. All of that also would come up positive per ENCODE’s definition of “functional”.

    ENCODE’s definition of “functional” is so broad that it is only one step up from including “dissolving in water” as an essential biological function.

  8. says

    But I thought everyone knew that “junk” DNA was actually the entire text of the Encycolpædia Galactica (or just possibly the Hitchhikers’ Guide to the Galaxy) encoded in Yyyy%#XXy (the Galactic Lingua Franca), and that this paper obviously supports that.

  9. jose says

    It would be so funny if Jerry Coyne started writing posts supporting Encode. Blessed Andraste, make it happen.

  10. george gonzalez says

    I’m sorry, I know nearly nothing about the “junk DNA” debate, but just as an onlooker, a lot of possibilities come to mind:

    “Junk DNA” might be good due to:

    (1) The “junk” spaces out the good bits, so a random poke from a gamma ray or some other intrusion has a lesser chance of damaging anything. Or if it damages something, it’s less likely to damage another good spot in the same event and cell.

    (2) DNA splits and recombines right? “Junk DNA” increases the chances that a split will occur in a junky bit. I think.

    (3) “Junk DNA”, are some parts of it some simple repeating sequence? Are there not mechanical engineering benefits to repeating sequences, you know, like in the strength of polymers and crystals? Might there be some benefit at some point in the life cycle to having slightly strionger and stiffer regions?

    I have the feeling that random yahoos like me have a better grasp of the possibles than the guy with the doggie bar chart. How much better would an actual DNA expert be!

  11. RFW says

    P-zed, noting your mention of teaching your students how to devise a good graph, I remind you of Edward Tufte’s seminal book “The Graphical Display of Quantitative Information” as the motherlode of sound advice on the subject.

    You are probably already well aware of Tufte’s book (and its sequels) but I can’t tell – hence this just-in-case heads-up.

    That the example graph is a steaming pile of nonsense goes without saying, but let’s try to be explicit about its defects:

    1. Humans are vertebrates and vertebrates are chordates. Hence the rightmost three bars do not represent distinct categories.

    2. The old, discredited concept of “man as the highest form of life” is implicit.

    3. Done properly, each bar would show for each category a measure of central tendency and a measure of dispersion. Usually these would be the arithmetic mean and the standard deviation, but not necessarily. If the data distribution represented by each bar is significantly non-normal, then the median and “coefficient of dispersion” may be more suitable. [Tip: graphing software often offers a specialized format for stock prices that is ideal for such graphs.]

    4. The colors are meaningless, as are the cute figures.

    5. Some bars have flat tops, others sloped ones. This is reminiscent of a sin that Fowler inveighed against in his “Modern English Usage”: the “elegant variation”.

    6. There shouid be a legend stating how many taxa contribute to each bar.

    7. “Fungi/plants” is a paraphyletic grouping without evolutionary significance.

    The thing is a mess. Whether this is due to deliberate obfuscation or merely incompetence is hard to say with certainty, but given the love of the anti-science, anti-everything crowd for lying through their teeth, I’d put my money on deliberate obfuscation.

    Tufte would love it for his rogues’ galleries of bad graphs.

  12. says

    #12: The error rate is per nucleotide — including junk in the genome does not diminish the probability of an error in a coding region in the slightest.

    Your third part is already accounted for. There are stretches of DNA that have a structural function, such as centromeres and telomeres, and long regions of repeats can have a suppressive effect on nearby coding genes as a side effect of silencing mechanisms. The point, however, is that their sequence does not carry information.

  13. sawells says

    Hang on, I think I have a new counterargument. Since the entire genome is replicated when the cell replicates, 100% of the genome must interact with protein or RNA at some point, regardless of function or not. Ergo, not only does the ENCODE definition not define function, but also ENCODE failed to correctly identify the interactions of the remaining 20% of the genome.

  14. gillt says

    PZ

    You know I was really pissed off at the crap ENCODE was promoting, that the genome was at least 80% functional and that there was no such thing as junk DNA.

    Promoting in the general press but not representative of the consortium or the the claims of the papers therein?

    For instance, does anyone see a problem with this definition of function from the Users Guide to the Encyclopedia of DNA Elements (ENCODE)

    For the purposes of this article, the term “functional element” is used to denote a discrete region of the genome that encodes a defined product (e.g., protein) or a reproducible biochemical signature, such as transcription or a specific chromatin structure. It is now widely appreciated that such signatures, either alone or in combinations, mark genomic sequences with important functions, including exons, sites of RNA processing, and transcriptional regulatory elements such as promoters, enhancers, silencers, and insulators. However, it is also important to recognize that while certain biochemical signatures may be associated with specific functions, our present state of knowledge may not yet permit definitive declaration of the ultimate biological role(s), function(s), or mechanism(s) of action of any given genomic element.

    At present, the proportion of the human genome that encodes functional elements is unknown. Estimates based on comparative genomic analyses suggest that 3%–8% of the base pairs in the human genome are under purifying (or negative) selection [4]–[7]. However, this likely underestimates the prevalence of functional features, as current comparative methods may not account for lineage-specific evolutionary innovations, functional elements that are very small or fragmented [8], elements that are rapidly evolving or subject to nearly neutral evolutionary processes, or elements that lie in repetitive regions of the genome.

  15. Francisco Bacopa says

    I never understood why creationists worried about junk DNA. Couldn’t they just wave their hands and say that the original DNA in Adam and Eve where our junk DNA is now is what allowed them to be immortal when activated by an enzyme in the fruit of the Tree of Life? Couldn’t they say that our resurrected bodies will have this DRA restored to its original function? And that in the meantime God has allowed it to become non functional?

    Come on dudes, don’t keep fighting science. Y’all always lose. Just make shit up like you usually do. You’ll be better off for it. And I was even kind enough to give you an example of how to do it. Plagiarize me. I don’t care.

  16. says

    I’ve always been suspicious of the idea that “junk” DNA is actually just junk. And I was just about to comment that I’m finally convinced that non-coding DNA is also non-functional, PZ has convinced me. Until I read this posting @ #2, above:

    Aside from utterly demolishing the ENCODE concept of “functional”, this may also have evolutionary significance.

    If I may be so bold, I’d like to suggest that perhaps “evolutionary significance” is what they mean by “functional”, so this statement seems self-contradicting.

  17. says

    “encodes a defined product (e.g., protein) or a reproducible biochemical signature, such as transcription…”

    I think pretty much everyone has a problem with this definition, they’re effectively saying anything that is chemically active is a functional element, or that anything detectable by them is therefore a functional element.
    Even if we ignore that out as obviously stupid, they then say if something is transcribed, it’s a functional element, which means that they’re saying there’s no such thing as erroneous transcription.
    Larry Moran addressed this particular issue (a few times) on his blog, including here:
    http://sandwalk.blogspot.com/2013/03/on-meaning-of-word-function.html

    george gonzalez:

    The arguments that you bring up have apparently been brought up by other biochemists, and encode apparently wouldn’t recognize those ‘functions’ you list as actual functions anyway. Also, the major problem with these possible explanations for Junk DNA that you’ve listed is the so-called “Onion Test”. The onion test boils down to ‘If that explains having a lot of Junk DNA, how do you explain having very little”? Not only is there a difference in the amount of Junk DNA between widely separated organisms (which could be allowed, since perhaps we don’t understand the subtleties of the benefits of ‘junk’), but there’s a huge difference in some instances between species that IN THE SAME GENUS!

    Gregory in Seattle:
    “I guess I need more clarity on this.

    It is my understanding that only a small percentage of human DNA codes for proteins. However, there are stretches of non-coding DNA that are extremely conserved and appear more stable than even coding stretches. ”

    Yes, and these are non-coding functional peices of DNA, they’ve been, as you point out, known for a very long time. Even when you include non-coding but still functional DNA, the vast majority of the human genome is still junk (with junk being non-coding, non-functional garbage).

  18. chrislawson says

    @maxdevlin:

    There is coding DNA which gets turned into protein sequences. There is non-coding DNA which does not get turned into proteins but which may still have functional properties. Among the non-coding DNA there are “junk” sequences which neither get turned into protein nor have any regulatory function. This latter point has been comprehensively demonstrated through numerous lines of research. There really is junk DNA in the true sense of DNA that serves absolutely no biological function and which can be deleted from organisms’ genomes without the slightest adverse effect.

    The only real debate is about how much of the non-coding DNA is junk, since we can’t be sure that we won’t discover some previously unknown functions in non-coding DNA. But to my mind there is no doubt that LINES, SINES, and broken viral genes are pure junk, and so at the bare minimum 52% of our genome is non-functional (aka “junk”).

  19. David Marjanović says

    “Junk DNA” might be good due to:

    You won’t be surprised to read that all of these possibilities have been suggested before…

    (1) The “junk” spaces out the good bits, so a random poke from a gamma ray or some other intrusion has a lesser chance of damaging anything. Or if it damages something, it’s less likely to damage another good spot in the same event and cell.

    PZ just said it, but it can’t be said often enough: mutations are statistic things that happen per number of nucleotides, not per number of genomes or genes. It’s not something you can dilute.

    (2) DNA splits and recombines right? “Junk DNA” increases the chances that a split will occur in a junky bit. I think.

    Uh, what’s bad about such a “split”? Do you mean crossing-over?

    (3) “Junk DNA”, are some parts of it some simple repeating sequence? Are there not mechanical engineering benefits to repeating sequences, you know, like in the strength of polymers and crystals? Might there be some benefit at some point in the life cycle to having slightly strionger and stiffer regions?

    Different species sometimes have different “GC content”, because the bonds between G and C are stronger than those between A and T: organisms living in very hot environments will have high GC content so the DNA doesn’t fall apart, transcription starts in a place called the TATA box, and so on. However, the genetic code is so redundant that a lot of this variation in GC content can be done in fully functional genes. Indeed, all the extremes occur in prokaryotes with practically no junk DNA at all.

    Promoting in the general press but not representative of the consortium or the the claims of the papers therein?

    Unfortunately fully representative of the consortium and the claims in its papers. Go read them and rageflail.

    For instance, does anyone see a problem with this definition of function from the Users Guide to the Encyclopedia of DNA Elements (ENCODE)

    For the purposes of this article, the term “functional element” is used to denote a discrete region of the genome that encodes a defined product (e.g., protein) or a reproducible biochemical signature, such as transcription

    “Cells awash in useless RNA”

    As hinted at above, the initiation of transcription isn’t very specific; it’s not tied to the presence of a functional gene. We produce lots and lots and lots of RNA that doesn’t code for a protein and is destroyed soon after it’s made; of the RNA that does code for a protein, much – likely most – codes for useless fragments encoded by pseudogenes and is destroyed soon after it’s made, too, with the protein fragments ending up in the proteasome. Transcription is not an efficient process, as the consortium clearly assumes. It’s not intelligently designed. It’s a mess.

    And if you had read Graur’s open-access paper, you’d know all of that already.

  20. Amphiox says

    If I may be so bold, I’d like to suggest that perhaps “evolutionary significance” is what they mean by “functional”, so this statement seems self-contradicting.

    No. What I meant in that post by “evolutionary significance” is definitely not what they meant by “functional”, per what they have written in their own papers.

    What I meant by “evolutionary significance” in that post is theoretical significance for evolutionary theory in general with nothing at all to do with any actual functions of any specific sequences or groups of sequences within the biology of any particular organism.

  21. gillt says

    @ roberschenck #21 & David#23:

    I interpreted “biochemical function” to be in line with ENCODE’s goal of annotating all biochemical function, which has always been described as a starting point, a database, so the definition appears expansive rather than unfalsifiable, laying the groundwork for *future* discovery of important associations among candidate regulatory elements, future insights into gene transcription, and general genomic variation. The genome is full of transcriptional noise and newly identified elements no doubt, most of which are not validated one way or the other. The organization at the biochemical level of our genomes is the spirit I think in which ENCODE operates. Granted, genomics is not my area of expertise.

  22. sonderval says

    Perhaps the final paragraph was meant in a different way (and just badly phrased):
    “Please do not use the amount of non-coding DNA as an argument against creationism, because if I am right, it might be a bad argument and this would give credence to creationism.”
    In that case, Mattick would still be a supporter of evolutionary theory.

  23. David Marjanović says

    The organization at the biochemical level of our genomes is the spirit I think in which ENCODE operates.

    Their papers conflate all these kinds of “functions”. Read Graur’s paper already.

  24. gillt says

    Read Graur’s paper already.

    I’ll say the same thing I did at Gregory’s blog: Graur’s piece was cynical, snarky (in only the way a tenured professor can be) and mostly his opinion on the proper place of big science. Doolittle’s paper was far better.

    the publicity surrounding ENCODE reveals the extent to which these understandings [of biological function] have been eroded. However, theoretical expansion in other directions, reconceptualizing junk, might be advisable.

    That’s the beginning of an objective criticism, unlike this grandstanding from Graur:

    The entire edifice of ENCODE was based on the assumption that whatever happens in cancer cell lines (some very very old) is relevant to human cells. Well, guess what? They aren’t and with the exception of the hype, nothing will be left of ENCODE.

    Bloviating and false. Consult table 2 for a list of cell lines used, or click on the 2nd link
    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3079585/
    http://encodeproject.org/ENCODE/cellTypes.html

    and this:

    Pay particular attention to their figure 3 (the karyotype of the cell line). Do you think we can learn anything from this cell on humans. Do humans, in your opinion, have 64 chromosomes (including 5 copies each of chromosome 1 and 5)?

    http://www.g3journal.org/content/early/2013/03/14/g3.113.005777.full.pdf+html

    An incredulity easily dismissed as only Dan Graur not knowing the benefits and limitations of one of the most microarrayed cell lines on the planet.