Let’s slap ENCODE around some more


Since we still have someone arguing poorly for the virtues of the ENCODE project, I thought it might be worthwhile to go straight to the source and and cite an ENCODE project paper, Defining functional DNA elements in the human genome. It is a bizarre thing that actually makes the case for rejecting the idea of high degrees of functionality, which is a good approach, since it demonstrates that they’ve at least seen the arguments against them. But then it sails blithely past those objections to basically declare that we should just ignore the evolutionary evidence.

Here’s the paragraph where they discuss the idea that most of the genome is non-functional.

Case for Abundant Junk DNA. The possibility that much of a complex genome could be nonfunctional was raised decades ago. The C-value paradox refers to the observation that genome size does not correlate with perceived organismal complexity and that even closely related species can have vastly different genome sizes. The estimated mutation rate in protein-coding genes suggested that only up to ∼20% of the nucleotides in the human genome can be selectively maintained, as the mutational burden would be otherwise too large. The term “junk DNA” was coined to refer to the majority of the rest of the genome, which represent segments of neutrally evolving DNA. More recent work in population genetics has further developed this idea by emphasizing how the low effective population size of large-bodied eukaryotes leads to less efficient natural selection, permitting proliferation of transposable elements and other neutrally evolving DNA. If repetitive DNA elements could be equated with nonfunctional DNA, then one would surmise that the human genome contains vast nonfunctional regions because nearly 50% of nucleotides in the human genome are readily recognizable as repeat elements, often of high degeneracy. Moreover, comparative genomics studies have found that only 5% of mammalian genomes are under strong evolutionary constraint across multiple species (e.g., human, mouse, and dog).

Yes, that’s part of it: it is theoretically extremely difficult to justify high levels of function in the genome — the genetic load would be simply too high. We also see that much of the genome is not conserved, suggesting that it isn’t maintained by selection. Not mentioned, though, are other observations, such as the extreme variability in genome size between closely related species that does not seem to be correlated with complexity or function at all, or that much “junk” DNA can be deleted without any apparent phenotypic effect. It’s very clear to anyone with any appreciation of evolutionary constraints at all that the genome is largely non-functional, both on theoretical and empirical grounds.

Their next paragraph summarizes their argument for nearly universal function. It’s strange because it is so orthogonal to the previous paragraph: I’d expect at least some token effort would be made to address the constraints imposed by the evolutionary perspective, but no…the authors make no effort at all to reconcile what evolutionary biologists have said with what they claim to have discovered.

That’s just weird.

Here’s their argument: most of the genome gets biochemically modified to some degree and for some of the time.

Case for Abundant Functional Genomic Elements. Genome-wide biochemical studies, including recent reports from ENCODE, have revealed pervasive activity over an unexpectedly large fraction of the genome, including noncoding and nonconserved regions and repeat elements. Such results greatly increase upper bound estimates of candidate functional sequences. Many human genomic regions previously assumed to be nonfunctional have recently been found to be teeming with biochemical activity, including portions of repeat elements, which can be bound by transcription factors and transcribed, and are thought to sometimes be exapted into novel regulatory regions. Outside the 1.5% of the genome covered by protein-coding sequence, 11% of the genome is associated with motifs in transcription factor-bound regions or high-resolution DNase footprints in one or more cell types, indicative of direct contact by regulatory proteins. Transcription factor occupancy and nucleosome-resolution DNase hypersensitivity maps overlap greatly and each cover approximately 15% of the genome. In aggregate, histone modifications associated with promoters or enhancers mark ∼20% of the genome, whereas a third of the genome is marked by modifications associated with transcriptional elongation. Over half of the genome has at least one repressive histone mark. In agreement with prior findings of pervasive transcription, ENCODE maps of polyadenylated and total RNA cover in total more than 75% of the genome. These already large fractions may be underestimates, as only a subset of cell states have been assayed. However, for multiple reasons discussed below, it remains unclear what proportion of these biochemically annotated regions serve specific functions.

That’s fine. Chunks of DNA get shut down to transcription by enzymatic modification; we’ve known that for a long time, but it’s generally regarded as evidence that that bit of DNA does not have a useful function. But to ENCODE, DNA that is silenced counts as a function. Footprint studies find that lots of bits of DNA get weakly or transiently bound by transcription factors; no surprise, it’s what you’d expect of the stochastic processes of biochemistry. Basically they’re describing behavior as functional that which is more reasonably described as noise in the system, and declaring that it trumps all the evolutionary and genetic and developmental and phylogenetic observations of the genome.

No, I’m being too charitable. They aren’t even trying to explain how that counters all the other evidence — they’re just plopping out their observations and hoping we don’t notice that they are failing to account for everything else.

I rather like Dan Graur’s dismissal of their logic.

Actually, ENCODE should have included “DNA replication” in its list of “functions,” and turn the human genome into a perfect 100% functional machine. Then, any functional element would have had a 100% of being in the ENCODE list.

Comments

  1. says

    Yeah! But whatever you say I still think that ‘junk DNA’ is the text of the Encyclopaedia Galactica, encoded there by aliens.
    Checkmate Science (so called) Hah!!

  2. Pierce R. Butler says

    … nearly 50% of nucleotides in the human genome are readily recognizable as repeat elements, often of high degeneracy.

    Sounds like they took too many samples from my family, friends, ‘n’ neighbors.

  3. david says

    Say some deleterious sequence gets inserted, possibly by duplication and then a few mutations. Either the organism’s descendants somehow escape from the negative effects of the new gene, or they die out. They can “escape” by deleting the new DNA or by silencing it. The cellular mechanisms for silencing DNA are well-established. The cost of carrying “junk DNA” must be low, as can be seen by the large number of non-functional erv sequences we carry without ill effects. So, the presence of DNA which is both silenced and not conserved is actually an argument against ENCODE.

  4. Nerd of Redhead, Dances OM Trolls says

    One must never forget that the test they use to describe functionality is actually a proxy test. It doesn’t measure genes that are activated to functionality, but rather the potential to be functionally active.

    But like any subroutine in a large program of bloated software, if that subroutine is never called, it is junk code. Just as any genes never expressed in-vivo is junk DNA.

    ENCODE uses an idiosyncratic definition of function, which is out of touch with the rest of science. They should be embarrassed by such chicanery.

  5. AlexanderZ says

    So…
    Does ENCODE has some value at all or is it completely useless? Can their work be “salvaged” for future research or is money down the drain?

  6. gillt says

    Basically they’re describing behavior as functional that which is more reasonably described as noise in the system, and declaring that it trumps all the evolutionary and genetic and developmental and phylogenetic observations of the genome.

    Contradicts your previous sentiments regarding the too easy dismissal of variation as “background noise” in developmental biology.

    Chunks of DNA get shut down to transcription by enzymatic modification; we’ve known that for a long time

    It’s trivial to apply gross observations gathered a decade ago on a few genes to recent fine-grained genome-wide analysis across cell types as something ‘we’ve known a long time.’ It’s like the difference in confusing genetics with genomics.

    The paper argues for a working description of function that is complex and context dependent. The paper argues for degrees (see figure 1) of interesting biochemical signal that are worth comprehensively cataloging–ENCODE is a mapping project!–on a giant publicly accessible database for future work that will continually be added to and refined as more experiments are done. Why this substantial bit of good news is always ignored by the cranky critics of ENCODE is beyond me.

  7. yubal says

    much “junk” DNA can be deleted without any apparent phenotypic effect.

    I’d be interested in literature on that question.

    There are so many “mice without phenotype” observed, I mean, after deleting something conserved with established function…How do they actually show those deleted “junk” DNA doesn’t do anything if so many people who delete functional genes to enforce a phenotype don’t see one?

  8. yoav says

    @ AlexanderZ #6
    ENCODE has a lot of value, like gillt have already pointed. There is good reason to criticize some overreaching statements made in the project’s summary paper but the large number of publicly available, comprehensive, datasets is defiantly a good thing.

  9. sugarfrosted says

    @OP(PZM).

    Not mentioned, though, are other observations, such as the extreme variability in genome size between closely related species that does not seem to be correlated with complexity or function at all, or that much “junk” DNA can be deleted without any apparent phenotypic effect.

    Has any work been done with the viability of offspring with deleted “junk”-DNA? I seem to recall the “function” of “junk”-DNA is that is absorbs mutation, in that it lowers the probability that a harmful mutation will take place. So would the removal of “junk” DNA effect this? (I’m trying to say something I don’t really have any expertise whatsoever about. I had biology in HS and some category filler at my community college and never touched it again, other than popsci type stuff like you.)

  10. David Marjanović says

    I don’t understand why the ENCODE people find it so hard to imagine that the transcription machinery doesn’t bind to promoters 100 % of the time and to junk 0 % of the time.

    I don’t understand how that manuscript got through peer review.

    I seem to recall the “function” of “junk”-DNA is that is absorbs mutation, in that it lowers the probability that a harmful mutation will take place.

    It does no such thing. Mutations don’t happen at a number per genome, they happen at a number per amount of nucleotides. If you have more nucleotides in your genome, you’ll have more mutations; a larger proportion of those will be harmless because they happen in junk, but the number of mutations in useful stuff will not change.

  11. gillt says

    If you have more nucleotides in your genome, you’ll have more mutations; a larger proportion of those will be harmless because they happen in junk, but the number of mutations in useful stuff will not change.

    More nucleotides do not simply equal more mutations because mutation rates differ among nucleotides (noncoding and coding DNA, and coding DNA and conserved DNA). You can even predict a protein coding stretch of DNA based on frequency of indels and synonymous mutations. Also, we’ve known that the shedding of DNA in a genome (deletion rates) varies across lineages, which is a popular hypothesis for explaining the giant genomes seen in some salamanders.

  12. David Marjanović says

    mutation rates differ among nucleotides (noncoding and coding DNA, and coding DNA and conserved DNA)

    No, that’s mutation rates after selection, after all lethal mutations have removed their bearers from the gene pool. “Conserved” doesn’t mean fewer mutations happen, it means they’re less likely to survive across generations.

    Also, we’ve known that the shedding of DNA in a genome (deletion rates) varies across lineages

    Well, yeah: the selection for small cells isn’t equally strong in all lineages. Cell size correlates surprisingly strongly to genome size.

    some salamanders

    All of them. :-)

  13. gillt says

    Your original post seemed to be referring to accumulated mutations per generation.

    I don’t think it’s been settled that selection for small cells is the driving force for genome size. Do you have a citation?

    Not all salamanders have giant genomes, and in Plethodontid you see great variation in size (15-80Gb); that’s why I used them as an example.