Since we still have someone arguing poorly for the virtues of the ENCODE project, I thought it might be worthwhile to go straight to the source and and cite an ENCODE project paper, Defining functional DNA elements in the human genome. It is a bizarre thing that actually makes the case for rejecting the idea of high degrees of functionality, which is a good approach, since it demonstrates that they’ve at least seen the arguments against them. But then it sails blithely past those objections to basically declare that we should just ignore the evolutionary evidence.
Here’s the paragraph where they discuss the idea that most of the genome is non-functional.
Case for Abundant Junk DNA. The possibility that much of a complex genome could be nonfunctional was raised decades ago. The C-value paradox refers to the observation that genome size does not correlate with perceived organismal complexity and that even closely related species can have vastly different genome sizes. The estimated mutation rate in protein-coding genes suggested that only up to ∼20% of the nucleotides in the human genome can be selectively maintained, as the mutational burden would be otherwise too large. The term “junk DNA” was coined to refer to the majority of the rest of the genome, which represent segments of neutrally evolving DNA. More recent work in population genetics has further developed this idea by emphasizing how the low effective population size of large-bodied eukaryotes leads to less efficient natural selection, permitting proliferation of transposable elements and other neutrally evolving DNA. If repetitive DNA elements could be equated with nonfunctional DNA, then one would surmise that the human genome contains vast nonfunctional regions because nearly 50% of nucleotides in the human genome are readily recognizable as repeat elements, often of high degeneracy. Moreover, comparative genomics studies have found that only 5% of mammalian genomes are under strong evolutionary constraint across multiple species (e.g., human, mouse, and dog).
Yes, that’s part of it: it is theoretically extremely difficult to justify high levels of function in the genome — the genetic load would be simply too high. We also see that much of the genome is not conserved, suggesting that it isn’t maintained by selection. Not mentioned, though, are other observations, such as the extreme variability in genome size between closely related species that does not seem to be correlated with complexity or function at all, or that much “junk” DNA can be deleted without any apparent phenotypic effect. It’s very clear to anyone with any appreciation of evolutionary constraints at all that the genome is largely non-functional, both on theoretical and empirical grounds.
Their next paragraph summarizes their argument for nearly universal function. It’s strange because it is so orthogonal to the previous paragraph: I’d expect at least some token effort would be made to address the constraints imposed by the evolutionary perspective, but no…the authors make no effort at all to reconcile what evolutionary biologists have said with what they claim to have discovered.
That’s just weird.
Here’s their argument: most of the genome gets biochemically modified to some degree and for some of the time.
Case for Abundant Functional Genomic Elements. Genome-wide biochemical studies, including recent reports from ENCODE, have revealed pervasive activity over an unexpectedly large fraction of the genome, including noncoding and nonconserved regions and repeat elements. Such results greatly increase upper bound estimates of candidate functional sequences. Many human genomic regions previously assumed to be nonfunctional have recently been found to be teeming with biochemical activity, including portions of repeat elements, which can be bound by transcription factors and transcribed, and are thought to sometimes be exapted into novel regulatory regions. Outside the 1.5% of the genome covered by protein-coding sequence, 11% of the genome is associated with motifs in transcription factor-bound regions or high-resolution DNase footprints in one or more cell types, indicative of direct contact by regulatory proteins. Transcription factor occupancy and nucleosome-resolution DNase hypersensitivity maps overlap greatly and each cover approximately 15% of the genome. In aggregate, histone modifications associated with promoters or enhancers mark ∼20% of the genome, whereas a third of the genome is marked by modifications associated with transcriptional elongation. Over half of the genome has at least one repressive histone mark. In agreement with prior findings of pervasive transcription, ENCODE maps of polyadenylated and total RNA cover in total more than 75% of the genome. These already large fractions may be underestimates, as only a subset of cell states have been assayed. However, for multiple reasons discussed below, it remains unclear what proportion of these biochemically annotated regions serve specific functions.
That’s fine. Chunks of DNA get shut down to transcription by enzymatic modification; we’ve known that for a long time, but it’s generally regarded as evidence that that bit of DNA does not have a useful function. But to ENCODE, DNA that is silenced counts as a function. Footprint studies find that lots of bits of DNA get weakly or transiently bound by transcription factors; no surprise, it’s what you’d expect of the stochastic processes of biochemistry. Basically they’re describing behavior as functional that which is more reasonably described as noise in the system, and declaring that it trumps all the evolutionary and genetic and developmental and phylogenetic observations of the genome.
No, I’m being too charitable. They aren’t even trying to explain how that counters all the other evidence — they’re just plopping out their observations and hoping we don’t notice that they are failing to account for everything else.
I rather like Dan Graur’s dismissal of their logic.
Actually, ENCODE should have included “DNA replication” in its list of “functions,” and turn the human genome into a perfect 100% functional machine. Then, any functional element would have had a 100% of being in the ENCODE list.