You are a mutant, and your genome is full of junk. What’s the problem?


These kinds of calculations are always handy. Larry Moran estimates the number of novel mutations you carry: the textbooks say about 300, he calculates something over 120. So next time a creationist tells you all mutations are deleterious, just tell him he’s a mutant himself with somewhere around a few hundred random nucleotide changes from either of his parents. What Larry doesn’t mention in this estimate, but I know he’s familiar with the idea, is that most of those mutations will be neutral: about 95% will fall into junk DNA, many won’t affect the amino acid sequence of any proteins, others may cause slight changes in the protein sequence that don’t detectably affect the phenotype.

In the category of utterly baffling pronouncements from scientists, Larry also chastises John Greally for misrepresenting junk DNA in an interview with Ira Flatow. I could scarcely believe it myself, but I listened to the interview, and Greally actually seems to be conflating regulatory sequences with junk, and Flatow introduces the story as suggesting that junk DNA may all have a function. He also claims that if you have a mutation in a gene, the “gene is dead” and will have no function. None of this is correct. It’s bizarre—I think Larry and I are fairly familiar with the genetics literature, and there’s nothing to support these contentions and quite a bit to contradict them.

Comments

  1. says

    To paraphrase my former lecturer and popular-science author, Dr Leroi, “We are all mutants, but some of us are more mutant than others”.

    Creationists are definitely more mutant than others :p

  2. Dianne says

    I love the headline.

    Concerning junk DNA and function, correct me if I’m wrong, but I seem to think that there is at least one instance of (probable) junk DNA having a function. IIRC, the hemoglobin-beta gene doesn’t work correctly if you leave out the “junk” sitting between the promoter and the gene, but that the actual sequence doesn’t matter. It basically has to have some nucleotides there to take up space and allow the gene to configure correctly in three dimensional space, but the exact nucleotides don’t matter. For whatever that means to anyone’s belief system.

  3. Darby says

    It seems like a lot of scientific arguments – you get two sides with somewhat absolute positions, and when the dust all settles, the truth turns out to be somewhere in the middle.

    It always strikes me as arrogant to assume that a majority of the DNA that is diligently copied and passed on is useless, when small savings in synthetic energy (such as stopping the making of coenzymes when they are regularly available in the diet) seem to be commonplace in evolutionary pathways.

    When we were kids, tonsils were routinely removed because their function wasn’t understood, and if we can’t figure out what they’re doing (this very day), they can’t be doing anything… This is a variation of the rationale that leads folks to ID.

  4. Chase says

    At least in the current version, Larry does mention that many of the mutations are neutral.

  5. micheyd says

    That’s a nice example, Dianne. I don’t like grouping such diverse stuff into “junk DNA” much, but I guess it’s a convenient catch-all for now, with less meaning over time as functions for noncoding areas are discovered (as is happening now). But really, does it have to be so derogatory? :) This is my precious genome we’re talking about!

  6. says

    It would be more precise to refer to it as “non-coding” DNA. If you don’t start doing so Casey Luskin will prove to the world the evilutionists were wrong all along about genetics and Forrest Mimms was correct in his succesful prediction of the past from the future.

  7. says

    most of those mutations will be neutral: about 95% will fall into junk DNA, many won’t affect the amino acid sequence of any proteins, others may cause slight changes in the protein sequence that don’t detectably affect the phenotype.

    Or they could alter the pattern of gene expression…

    At university I was in the department of genetics, biochemistry and molecular biology. Us geneticists always wanted to call the student society MutantSoc or The Wild Types, but we were always voted down. A pity, we ended up with the BAGAMs (Biochemistry And Genetics And Molecular biology). Lame.

  8. Former PZ Student says

    From Larry Moran’s site :

    “The mutation rate due to errors made by the DNA polymerase III replisome is one error for every one hundred million bases (nucleotides) that are incorporated into DNA. This is an error rate of 1/100,000,000, commonly written as 10-8 in exponential notation. Technically, these aren’t mutations; they count as DNA damage until the problem with mismatched bases in the double-stranded DNA has been resolved. The DNA repair mechanism fixes 99% of this damage but 1% escapes repair and becomes a mutation. The error rate of repair is 10-2 so the overall error rate during DNA replication is 10-10 nucleotides per replication (10-8 × 10-2) (Tago et al., 2005).”

    Are there locations on the DNA strand of E. coli, which are more prone to mutation? I mean to say if a lab were to grow 100 cultures of E. coli, would the second generation have identical mutations amongst different populations?

  9. Pattanowski says

    Just think of all the misunderstanding that could have been avoided if the term “junk DNA” had been avoided. I don’t know of any molecule that I would refer to as “junk”.

  10. ken says

    Darby…the experiments have been done. You delete massive portions of junk DNA, and nothing happens. Some small % of it, undoubtedly, will turn out to be functional, but IDers would have you believe it’s all functional, which is provably wrong.

    A large % of the junk is indeed “functional”…the hitch is that it’s not functional from the point of view of the organism that carries it. A good chunk of it is “selfish” transposons and retroviruses. Parasitic DNA.

    Bear in mind that, to simplify, DNA is a 2D string in a 3D cell. The average cell doesn’t divide particularly frequently, so a few billion base pairs of junk isn’t necessarily a huge metabolic hassle.

  11. raven says

    netpets.org
    Genetic Load by John Armstrong

    One of the most fundamental misconceptions is that most of the individuals in a normal population do not carry genes for genetic diseases. DELETED FOR LENGTH

    The truth is that it is virtually impossible to avoid genetic disease. Geneticists believe that most species carry a “genetic load” of 3-5 recessive lethal genes. The difference between purebred dogs and a human is that the latter have something in excess of 2500 genetic diseases, but most of them are extremely rare and thus seldom come from both parents to produce an affected child, whereas many dog breeds have a relatively small number of very common genetic diseases. It is the frequency of these problems, rather than the number of different ones, that is the true indicator of genetic health in a population.

    We are all indeed mutants. The average human carries between 3-5 recessive genetic lethals. That is why the common advice to, “go f*** yourself” is a bad idea. It is also one reason why inbred populations such as creos have problems.

  12. NatureSelectedMe says

    I can’t believe anyone is calling any part of DNA junk. Can’t we just say it doesn’t seem to code for genes? Wouldn’t that be better? I think ‘Junk’ implies that we’ve finished studying DNA. We know everything there is to know because we can define ‘genes’. Is life only genes? Isn’t that just the current theory? I can’t help but think that there is much more to know. ‘Junk’ DNA is a stupid name that I think PZ is promoting just to generate comments.

  13. Darby says

    Ken –

    You can remove a lot of structures from an organism and not get detectable effects (especially when you don’t know for sure what the effects might be). I’m not sure that with a wide cohort study we’d find a significant difference between adults with their tonsils and those without. That fails to provide a lot of evidence that those structures have no use; I’d lean toward it providing evidence that we haven’t asked the right questions about it.

    Can you tell that I’m not a fan of even the concept of vestigial structures-?

    I tend toward believing that conservation of features it itself a strong suggestion of use. And for something like DNA, I sometimes get the feeling that the researchers are flying on much fuzzier assumptions than they think. Heck, anyone who even shallowly follows the field should realize that this year’s interpretation won’t last very long.

  14. Tiskel says

    Um. Re #17 / vestigial structures: How would we (as part of a large population) get rid of these structures any other way other than having them slowly recede and eventually (very slowly, due to our 20 – 30 year generations), either reduce to nothing or have the remaining structure co-opted for some other purpose?

    I can understand the desire to apply some function to all of our structures, but since all organisms are just a single instance of a very long time-line, and since we are also all mutating independently as well as accumulating the non-(pre reproduction)fatal mutations of our ancestors, of course we will have vestigial structures.

  15. says

    As far as neutrality of new mutations, you are perhaps not correct, as far as the coding fraction goes. One of my labmates published a paper on the subject, here:
    http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=17357078&ordinalpos=2&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum
    which says that something like 70% of novel missense mutations are probably at least mildly deleterious. So, most amino-acid altering mutations should have some functional effect and should not be neutral. This is probably why proteins don’t really evolve very much. Also, I can’t be arsed to do the math, but given the degeneracy of the genetic code I would guess most mutations falling in coding regions are amino-acid changing.

  16. says

    Those Junk-DNA has a name and it’s called “introns”. As opposed to “exons”.

    And since when do scientists HAVE TO believe it’s just junk to distinguish themselves from ID followers? Just because we haven’t yet figured out all functions of the genes doesn’t mean we have to condem them, do we?

  17. ken says

    Can you tell that I’m not a fan of even the concept of vestigial structures-?

    If you’re a creationist, that’s par for the course.

    As I said before, a large % of that stuff is transposons and retroviruses. It’s useful to them, not to you. It repeats and repeats, and the amount of this “junk” varies hugely from one organism to the next. If you’ve got a hypothesis that’s more specific than “God doesn’t make waste”, let’s hear it.

  18. ken says

    Come to think of it, certain portions of your DNA are not merely neutral junk…they’re harmful to YOU ! Examples include transposons, homing endonucleases, certain imprinted genes, B chromosomes, autosomal killers, and more. Apparently God is not just wasteful…he’s malevolent.

  19. tony says

    Should never read two posts so quickly…

    I just read hemoglobin as hobgoblin, in a post above…. and figured we finally had a rational explanation for creos!

    Just been reading Don’t debate Creationists … so ‘lower orders of human’ have been on my mind!

  20. ken says

    Oh, I forgot to mention chromosome diminution. Some critters actually “shed” huge amounts (95%) of DNA in somatic cells. It’s retained only in the germline. Lesson: it has no function outside the germline. It’s probably useless in the germline as well…a lot of this DNA is just 5-10 base repeats.

  21. David Marjanović says

    I can’t believe anyone is calling any part of DNA junk.

    A large part of it is known to be junk, that’s why. Firstly, we see what it is: badly excised transposons, pseudogenes derived from duplications of our genes, pseudogenes derived from retrovirus genomes and transposons, huge numbers of nonsensical repeats of short sequences. Secondly, the experiments have been done: change its amount, and nothing happens, except that the time necessary to replicate the genome changes: the more you have, the longer it takes, obviously.

    Why do we still have it? Because deletions are not so easy as you seem to believe, and because we are not under much selective pressure to speed up our cell divisions. Birds have smaller cells and less junk. Most bacteria have almost no junk at all — that’s how you get cells that can divide every 20 minutes.

    And no, introns are a rather small part of junk DNA, not all of it.

    Two charts of the human nuclear genome from the “Molecular Pathology” lecture I recently did the exam on:

    ——-

    Highly conserved (coding): 1.5 %
    Highly conserved (other): ~ 3 %
    Transposon-based repeats: ~ 45 %
    Heterochromatin (“mostly consists of assemblages of short repeated sequences”): ~ 6.6 %
    Other non-conserved: ~ 44 %

    ——-

    Repeats: 61 %
    – LINEs (long interspersed nuclear elements — code for reverse transcriptase): 21 % of the total
    – SINEs (short … — Alu repeats): 13 %
    – Retroviral-like elements: 8 %
    – DNA-only transposon “fossils”: 3 %
    (note that all the above together make up 45 %, the number the other chart gives for “transposon-based repeats”)
    – Segmental duplications: 3 %
    – Simple sequence repeats: 5 %
    – Heterochromatin (including centromeres): 8 % (I don’t know why the other chart says 6.6)
    Unique: 39 %
    – Not explained in the chart, although sequenced (presumably regulatory parts and pseudogenes together): 14 %
    – Genes: 26 %
    — Introns: 24.5 %
    — Exons: 1.5 %

    Junk, junk, junk, junk, junk. Eat that.

  22. David Marjanović says

    I can’t believe anyone is calling any part of DNA junk.

    A large part of it is known to be junk, that’s why. Firstly, we see what it is: badly excised transposons, pseudogenes derived from duplications of our genes, pseudogenes derived from retrovirus genomes and transposons, huge numbers of nonsensical repeats of short sequences. Secondly, the experiments have been done: change its amount, and nothing happens, except that the time necessary to replicate the genome changes: the more you have, the longer it takes, obviously.

    Why do we still have it? Because deletions are not so easy as you seem to believe, and because we are not under much selective pressure to speed up our cell divisions. Birds have smaller cells and less junk. Most bacteria have almost no junk at all — that’s how you get cells that can divide every 20 minutes.

    And no, introns are a rather small part of junk DNA, not all of it.

    Two charts of the human nuclear genome from the “Molecular Pathology” lecture I recently did the exam on:

    ——-

    Highly conserved (coding): 1.5 %
    Highly conserved (other): ~ 3 %
    Transposon-based repeats: ~ 45 %
    Heterochromatin (“mostly consists of assemblages of short repeated sequences”): ~ 6.6 %
    Other non-conserved: ~ 44 %

    ——-

    Repeats: 61 %
    – LINEs (long interspersed nuclear elements — code for reverse transcriptase): 21 % of the total
    – SINEs (short … — Alu repeats): 13 %
    – Retroviral-like elements: 8 %
    – DNA-only transposon “fossils”: 3 %
    (note that all the above together make up 45 %, the number the other chart gives for “transposon-based repeats”)
    – Segmental duplications: 3 %
    – Simple sequence repeats: 5 %
    – Heterochromatin (including centromeres): 8 % (I don’t know why the other chart says 6.6)
    Unique: 39 %
    – Not explained in the chart, although sequenced (presumably regulatory parts and pseudogenes together): 14 %
    – Genes: 26 %
    — Introns: 24.5 %
    — Exons: 1.5 %

    Junk, junk, junk, junk, junk. Eat that.

  23. David Marjanović says

    Oh, I forgot to mention chromosome diminution. Some critters actually “shed” huge amounts (95%) of DNA in somatic cells. It’s retained only in the germline.

    Oh yeah, the ciliate Tetrahymena is the textbook example for this. Its macronucleus consists of lots of copies of a very small part of the DNA of the micronucleus: the macronucleus (genes in lots of copies) is what is actually used, and the micronucleus (the complete genome) has no function outside of sexual reproduction.

  24. David Marjanović says

    Oh, I forgot to mention chromosome diminution. Some critters actually “shed” huge amounts (95%) of DNA in somatic cells. It’s retained only in the germline.

    Oh yeah, the ciliate Tetrahymena is the textbook example for this. Its macronucleus consists of lots of copies of a very small part of the DNA of the micronucleus: the macronucleus (genes in lots of copies) is what is actually used, and the micronucleus (the complete genome) has no function outside of sexual reproduction.

  25. mikmik says

    Look, with all this gobble-de-gook in our DNA, I am amazed we make it out of bed in the morning. What you lookin’ at!? Yeah, I’m havin a bad day – why don’t you try to operate properly by searching through all this junk for the right instructions all the time, see how you make out!

    Okay, okay, back to bed.

  26. NatureSelectedMe says

    Junk, junk, junk, junk, junk. Eat that.

    No thank you, I’m on a diet. :)

    So you have figured it all out. My bad. I guess there’s nothing more to do.

    Except look at those percentages. What are you adding up in the second set of percentages? What do you mean “repeats 61%”?

    Also, remember you’re looking at this through the bias of your current theory. (I dare not call it anything otherwise I will be accused of supporting that other theory that PZ is so fond of writing about)

    I know that’s what scientists do but it makes you look so stubborn. Wait, I think I figured out something here. Academia have to teach so they are very strict to the theory. They have to grade. Non-teaching scientists would be more flexible. Maybe.

  27. ken says

    I didn’t miss the Nature paper. What in it do you find so compelling as an argument against “junk”? There’s nothing new about “leaky” transcription, by the way.

    I did miss the “Wade Schauer” stuff, though. This cat claims “Encode” drove the last nail in the “junk DNA coffin”, and that conserved sequences merely prove that the Designer is re-using the same motifs. I suppose if we found that sequences were less conserved, that would be taken as evidence of the Designer’s infinite creativity.

    I think a lot of scientists were initially surprised at the volume of junk DNA in many genomes. That’s not because of their religious views…it simply seemed metabolically wasteful. This notion that biologists should be astonished by any new finding that shows that a stretch of junk isn’t so junky…it’s just another pathetic ID strawman.

  28. David Marjanović says

    Look, with all this gobble-de-gook in our DNA, I am amazed we make it out of bed in the morning. What you lookin’ at!? Yeah, I’m havin a bad day – why don’t you try to operate properly by searching through all this junk for the right instructions all the time, see how you make out!

    That’s not how transcription works. RNA polymerase doesn’t attach to a telomere and slide forward till it somehow recognizes the gene it’s supposed to transcribe. That would almost literally take forever. Instead, transcription factors (which are proteins) stick (literally — it’s electrostatics) to a specific sequence, and they are what RNA polymerase binds to. This is highschool biology.

    What are you adding up in the second set of percentages? What do you mean “repeats 61%”?

    61 % of the genome are repeats. To be precise, 21 % (of the total 100 %, not of the 61 % that are repeats) are LINEs, 13 % are SINEs, and so on; all those repetitive sequences add up to 61 % of the human genome.

    Also, remember you’re looking at this through the bias of your current theory. (I dare not call it anything otherwise I will be accused of supporting that other theory that PZ is so fond of writing about)

    What do you mean? Evolution? No, I’m not. Go ahead, tell me what a LINE is good for. Tell me what good it is to carry defunct retroviruses around. I wish you lots of fun.

    (Oh, and please learn the meaning of the technical term “theory”.)

    Academia have to teach so they are very strict to the theory. They have to grade. Non-teaching scientists would be more flexible. Maybe.

    I’m a graduate student. I don’t teach.

    ————–

    Thank you very much for that article on Drosophila heterochromatin (http://www.sciencedaily.com/releases/2007/06/070615091210.htm)! I quote:

    Repeating sequences are the hallmark of heterochromatin, and there are several distinct kinds. Simple, short repeats are called satellite DNAs, which tend to become more abundant near the centromeres, adding up to hundreds of thousands or even millions of bases in length. In these “seas” of satellite DNA there are “islands” of moderate-length repeats totaling only tens or hundreds of kilobases, made up of transposons or fragments of transposons.

    In other regions of heterochromatin, the transposons constitute the sea. Here the islands are single-copy genes, or lengths of DNA that code for RNAs other than the messenger RNA needed to make proteins, and other functional elements.

    Satellite DNA is junk*, and transposons are junk, too (for us, not for themselves).

    * Well. Telomeres and centromeres are satellite DNA, too, but have a function. Still, that function consists in just being there. The exact length and the exact sequence that is repeated again and again matter little; telomeres and centromeres are not transcribed.

    I agree: there are genes in heterochromatin (silenced like all heterochromatin, but still genes), so the numbers of 26 % for genes and 1.5 % for exons in the human genome are likely too small, and I was wrong to count the entire heterochromatin as repeats. The numbers are, however, likely not far off, because most of heterochromatin (in Drosophila) does consist of junk, as the article says in no uncertain terms.

    Likewise, the 14 % of unique ( = not repeated) DNA that is not explained in the chart must contain not only promoters, enhancers, silencers etc., and signals for transcription start and end, but also genes for regulatory RNA; sorry for forgetting that. Genes for regulatory RNA have been found in the heterochromatin of Drosophila, too, but, again, they don’t make up a lot of it, compared to all the junk there.

    —————–

    Thanks a lot for the link to the Nature paper! Having read most of it (but not understood all of it — there are just too many abbreviations, for example), I conclude that:
    – Lots of junk gets transcribed. The paper says at least 19 % of pseudogenes get transcribed. Being pseudogenes, they do not lead to a functioning protein; either the RNA is somehow destroyed after transcription, or translation begins (up to the first premature stop codon) and then the protein fragment is shredded in the proteasome. Not only are pseudogenes junk, they lead to waste. Stupid design…
    – If their 1 % sample is representative — and I have little reason to doubt that –, then 4.9 % of the human genome is acted upon by natural selection. This ought to include the exons, many of the regulatory sequences, and probably the RNA genes. The rest can vary almost freely, as expected from junk (even though not all of it is junk).
    – The paper mentions “ancient repeats” that have “inserted early” in the history of the “mammalian lineage”. These must be the 61 % of repeats mentioned by the second chart I cited. The many, many regulatory sequences and RNA genes the authors found must all be in the 14 % that the chart fails to explain.
    – It appears that transcription is a very messy affair, with huge untranslated regions flanking every gene in pre-mRNA. In their own words (from the conclusions):

    However, we also uncovered some surprises that challenge the current dogma on biological mechanisms. The generation of numerous intercalated transcripts spanning the majority of the genome has been repeatedly suggested […], but this phenomenon has been met with mixed opinions about the biological importance of these transcripts. Our analyses of numerous orthogonal data sets firmly establish the presence of these transcripts, and thus the simple view of the genome as having a defined set of isolated loci transcribed independently does not seem to be accurate. Perhaps the genome encodes a network of transcripts, many of which are linked to protein-coding transcripts and to the majority of which we cannot (yet) assign a biological role. Our perspective of transcription and genes may have to evolve and also poses some interesting mechanistic questions. For example, how are splicing signals coordinated and used when there are so many overlapping primary transcripts? Similarly, to what extent does this reflect neutral turnover of reproducible transcripts with no biological role?

    I gather that almost everything is transcribed, and then splicing happens, leaving very little of the original transcripts. Huge waste (remember that transcription requires ATP, while cutting RNA does not regenerate ATP), if I’m right. That would be stupid design again.
    – Nature seems to be undergoing a very welcome change to publishing much, much longer papers than it used to. :-)

    Then I read the News & Views about that article. It seems to say (without using numbers…) that most of the 14 % I keep mentioning consists of regulatory sequences and that little or nothing of it is junk. There is no mention of any function for the retrovirus-/transposon-derived junk and the satellite DNA.

    ——————-

    So, let me rephrase: more than 45, and almost certainly more than 53, % of the human genome are junk, and these numbers don’t even include the introns (over 24.5 % of the human genome), which are mostly junk, too. :-) (I say “mostly” because at least some contain regulatory elements.)

  29. David Marjanović says

    Look, with all this gobble-de-gook in our DNA, I am amazed we make it out of bed in the morning. What you lookin’ at!? Yeah, I’m havin a bad day – why don’t you try to operate properly by searching through all this junk for the right instructions all the time, see how you make out!

    That’s not how transcription works. RNA polymerase doesn’t attach to a telomere and slide forward till it somehow recognizes the gene it’s supposed to transcribe. That would almost literally take forever. Instead, transcription factors (which are proteins) stick (literally — it’s electrostatics) to a specific sequence, and they are what RNA polymerase binds to. This is highschool biology.

    What are you adding up in the second set of percentages? What do you mean “repeats 61%”?

    61 % of the genome are repeats. To be precise, 21 % (of the total 100 %, not of the 61 % that are repeats) are LINEs, 13 % are SINEs, and so on; all those repetitive sequences add up to 61 % of the human genome.

    Also, remember you’re looking at this through the bias of your current theory. (I dare not call it anything otherwise I will be accused of supporting that other theory that PZ is so fond of writing about)

    What do you mean? Evolution? No, I’m not. Go ahead, tell me what a LINE is good for. Tell me what good it is to carry defunct retroviruses around. I wish you lots of fun.

    (Oh, and please learn the meaning of the technical term “theory”.)

    Academia have to teach so they are very strict to the theory. They have to grade. Non-teaching scientists would be more flexible. Maybe.

    I’m a graduate student. I don’t teach.

    ————–

    Thank you very much for that article on Drosophila heterochromatin (http://www.sciencedaily.com/releases/2007/06/070615091210.htm)! I quote:

    Repeating sequences are the hallmark of heterochromatin, and there are several distinct kinds. Simple, short repeats are called satellite DNAs, which tend to become more abundant near the centromeres, adding up to hundreds of thousands or even millions of bases in length. In these “seas” of satellite DNA there are “islands” of moderate-length repeats totaling only tens or hundreds of kilobases, made up of transposons or fragments of transposons.

    In other regions of heterochromatin, the transposons constitute the sea. Here the islands are single-copy genes, or lengths of DNA that code for RNAs other than the messenger RNA needed to make proteins, and other functional elements.

    Satellite DNA is junk*, and transposons are junk, too (for us, not for themselves).

    * Well. Telomeres and centromeres are satellite DNA, too, but have a function. Still, that function consists in just being there. The exact length and the exact sequence that is repeated again and again matter little; telomeres and centromeres are not transcribed.

    I agree: there are genes in heterochromatin (silenced like all heterochromatin, but still genes), so the numbers of 26 % for genes and 1.5 % for exons in the human genome are likely too small, and I was wrong to count the entire heterochromatin as repeats. The numbers are, however, likely not far off, because most of heterochromatin (in Drosophila) does consist of junk, as the article says in no uncertain terms.

    Likewise, the 14 % of unique ( = not repeated) DNA that is not explained in the chart must contain not only promoters, enhancers, silencers etc., and signals for transcription start and end, but also genes for regulatory RNA; sorry for forgetting that. Genes for regulatory RNA have been found in the heterochromatin of Drosophila, too, but, again, they don’t make up a lot of it, compared to all the junk there.

    —————–

    Thanks a lot for the link to the Nature paper! Having read most of it (but not understood all of it — there are just too many abbreviations, for example), I conclude that:
    – Lots of junk gets transcribed. The paper says at least 19 % of pseudogenes get transcribed. Being pseudogenes, they do not lead to a functioning protein; either the RNA is somehow destroyed after transcription, or translation begins (up to the first premature stop codon) and then the protein fragment is shredded in the proteasome. Not only are pseudogenes junk, they lead to waste. Stupid design…
    – If their 1 % sample is representative — and I have little reason to doubt that –, then 4.9 % of the human genome is acted upon by natural selection. This ought to include the exons, many of the regulatory sequences, and probably the RNA genes. The rest can vary almost freely, as expected from junk (even though not all of it is junk).
    – The paper mentions “ancient repeats” that have “inserted early” in the history of the “mammalian lineage”. These must be the 61 % of repeats mentioned by the second chart I cited. The many, many regulatory sequences and RNA genes the authors found must all be in the 14 % that the chart fails to explain.
    – It appears that transcription is a very messy affair, with huge untranslated regions flanking every gene in pre-mRNA. In their own words (from the conclusions):

    However, we also uncovered some surprises that challenge the current dogma on biological mechanisms. The generation of numerous intercalated transcripts spanning the majority of the genome has been repeatedly suggested […], but this phenomenon has been met with mixed opinions about the biological importance of these transcripts. Our analyses of numerous orthogonal data sets firmly establish the presence of these transcripts, and thus the simple view of the genome as having a defined set of isolated loci transcribed independently does not seem to be accurate. Perhaps the genome encodes a network of transcripts, many of which are linked to protein-coding transcripts and to the majority of which we cannot (yet) assign a biological role. Our perspective of transcription and genes may have to evolve and also poses some interesting mechanistic questions. For example, how are splicing signals coordinated and used when there are so many overlapping primary transcripts? Similarly, to what extent does this reflect neutral turnover of reproducible transcripts with no biological role?

    I gather that almost everything is transcribed, and then splicing happens, leaving very little of the original transcripts. Huge waste (remember that transcription requires ATP, while cutting RNA does not regenerate ATP), if I’m right. That would be stupid design again.
    – Nature seems to be undergoing a very welcome change to publishing much, much longer papers than it used to. :-)

    Then I read the News & Views about that article. It seems to say (without using numbers…) that most of the 14 % I keep mentioning consists of regulatory sequences and that little or nothing of it is junk. There is no mention of any function for the retrovirus-/transposon-derived junk and the satellite DNA.

    ——————-

    So, let me rephrase: more than 45, and almost certainly more than 53, % of the human genome are junk, and these numbers don’t even include the introns (over 24.5 % of the human genome), which are mostly junk, too. :-) (I say “mostly” because at least some contain regulatory elements.)

  30. windy says

    I think ‘Junk’ implies that we’ve finished studying DNA.

    Just like “the garage is filled with junk” implies that we have done a complete inventory?

  31. windy says

    I know that’s what scientists do but it makes you look so stubborn. Wait, I think I figured out something here. Academia have to teach so they are very strict to the theory. They have to grade. Non-teaching scientists would be more flexible.

    Wrong. As David said, not all academia have to teach. And those who do, love to slip their own discoveries in their teaching and they read the latest articles. Teachers who don’t do research often have to stick to the textbooks.

  32. ken says

    I’m starting to like “junk” more and more, simply because it pisses off the IDers. If it were called “ncDNA”, the creationists would never have even noticed.

    There’s a bit of a biology/physics tension on scienceblogs.com. Us bio types should initiate a campaign against “charmed” quarks, which reek of animism and witchcraft, and thus direct the attention of the creos elsewhere.

  33. says

    This Wade fellow sees himself as some sort of junk DNA refuter or something, and, of course, it all points ot creation/ID.

    It is all well and good that ENCODE found that so much of the genome encodes RNA. Apparently, much of these RNA transcripts are either themselves useless or are very tolerant of alternate sequences. The fact of the matter is that much noncoding differs from person to person, not to mention species to species. To see how noncoding DNA differs among primate species, see this:

    http://www2.norwich.edu/spage/alignmentalb.htm