I recently posted — and made a video about — a story about how de novo genes are made. I guess I was more timely than I expected, because The Scientist just posted on article on the same topic. It’s specifically about the work of Li Zhao, who is interested in the birth of new genes with novel functions, and is building on some other work done at UC Davis.
But around the same time Zhao began her research, new evidence challenged this longstanding view [that new genes don’t appear very often] with an alternative path. Population geneticist David Begun at the University of California, Davis (UC Davis) identified several de novo genes—genes originating from scratch, or non-coding DNA—in Drosophila melanogaster, the common fruit fly. Of the five genes, four occurred on the X chromosome and predominantly expressed in the testes, possibly under sexual selection pressures.
One other thing I should mention: my previous article focused on de novo genes in humans, who are terrible experimental subjects. Li Zhao is working on Drosophila, and there’s a reason flies are a premier model system for this kind of work — you can get multiple generations fast, you can do all kinds of genetic manipulations on them, and you can compare different lineages to evaluate the effects of the presence or absence of a specific gene. Or hundreds of genes, as she is finding.
By characterizing the transcriptomes of six previously sequenced D. melanogaster strains in the testes, Zhao and her colleagues uncovered potential de novo candidates. Of these, they identified 142 polymorphic (which segregated and evolved under selection) and 106 fixed (which remained consistent since the split from a common ancestor) de novo genes. Most of these candidates were regulated by cis elements, with expression driven by regulatory sequences just upstream of the new transcripts. The vast majority contained open reading frames (ORFs)—sections that could potentially produce proteins, marked by start and stop codons—of at least 150 base pairs. When comparing these sequences to ancestral genomes and non-expressing Drosophila strains, the same ORFs appeared, suggesting that the gene expression was driven primarily by regulatory changes.
Zhao and her colleagues proposed that these de novo genes may have undergone natural selection, as highly expressed genes were generally longer and more complex than those expressed at lower levels. However, whether these sequences were translated into proteins or served other functions remained unclear at the time. “Biology is more complex than what we imagine,” said Zhao.
Cool. But I’m going to don my skeptical hat, and suggest that I’m not seeing evidence that these novel genes are significant. The mechanisms for generating them are so easy that we shouldn’t be surprised that new genes are bubbling up out of the mostly chaotic junk in the genome, but when you don’t know what role those genes are playing in the organism, it’s a reach to suggest that they are important. I’m also unconvinced by observations of tissue-specific regulation during development — it’s also not difficult for regulatory sequences to be attached to a gene. Is it significant that so many of these novel genes are expressed in the testes? Male patters of gene expression in the gonads is a special case, and spurious expression could persist there because it has specific effects on sperm maturation that aren’t reflected in adult survival.
It’s still interesting stuff. I like the idea that entirely new genes trickle into populations and could contribute to variation in surprising ways.
How many of these genes are actually functional? They’re transcribed, and apparently at least some of them are translated. But how many of them actually do anything that’s of use to the organism?
You’re saying that evolution happens “right before our eyes?” How dare you!
Just so I understand (and I think I understand “translated” in this context), does this mean that proteins produced from these genes that have been detected experimentally? Do they conform into predictable secondary structures?
One reason I ask is that I was briefly looking into computational methods of protein folding in the early 90s as a CS student, knowing little biology or biochemistry, and I always wondered if the predictable folding was true in general of peptide sequences or only true of the the ones subject to natural selection. I remember asking biologists and never quite getting an answer I understood. I realize some sequences strongly predict alpha helices, etc., but I wondered what you’d expect from uniformly generated random sequences.
It seems plausible to me that a de novo gene that expresses some proteins would continue to be passed on as long as the proteins were not harmful, though there would be little reason to expect conservation. If they proved to have even a mild adaptive benefit after a few generations and were conserved, that could springboard the development of a new functional gene.
(Caveat: I am not a biologist, but I do know a little about all this. I’m happy to be corrected.)