ENCODE gets a public reaming

I rarely laugh out loud when reading science papers, but sometimes one comes along that triggers the response automatically. Although, in this case, it wasn’t so much a belly laugh as an evil chortle, and an occasional grim snicker. Dan Graur and his colleagues have written a rebuttal to the claims of the ENCODE research consortium — the group that claimed to have identified function in 80% of the genome, but actually discovered that a formula of 80% hype gets you the attention of the world press. It was a sad event: a huge amount of work on analyzing the genome by hundreds of labs got sidetracked by a few clueless statements made up front in the primary paper, making it look like they were led by ignoramuses who had no conception of the biology behind their project.

Now Graur and friends haven’t just poked a hole in the balloon, they’ve set it on fire (the humanity!), pissed on the ashes, and dumped them in a cesspit. At times it feels a bit…excessive, you know, but still, they make some very strong arguments. And look, you can read the whole article, On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE, for free — it’s open source. So I’ll just mention a few of the highlights.

I’d originally criticized it because the ENCODE argument was patently ridiculous. Their claim to have assigned ‘function’ to 80% (and Ewan Birney even expected it to converge on 100%) of the genome boiled down to this:

The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type.

So if ever a transcription factor ever, in any cell, bound however briefly to a stretch of DNA, they declared it to be functional. That’s nonsense. The activity of the cell is biochemical: it’s stochastic. Individual proteins will adhere to any isolated stretch of DNA that might have a sequence that matches a binding pocket, but that doesn’t necessarily mean that the constellation of enhancers and promoters are present and that the whole weight of the transcriptional machinery will regularly operate there. This is a noisy system.

The Graur paper rips into the ENCODE interpretations on many other grounds, however. Here’s the abstract to give you a summary of the violations of logic and evidence that ENCODE made, and also to give you a taste of the snark level in the rest of the paper.

A recent slew of ENCODE Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is under 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 − 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions, or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly (1) by employing the seldom used “causal role” definition of biological function and then applying it inconsistently to different biochemical properties, (2) by committing a logical fallacy known as “affirming the consequent,” (3) by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” (4) by using analytical methods that yield biased errors and inflate estimates of functionality, (5) by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.

You may be wondering about the curious title of the paper and its reference to immortal televisions. That comes from (1): that function has to be defined in a context, and that the only reasonable context for a gene sequence is to identify its contribution to evolutionary fitness.

Page 1 of 3 | Next page