Eric Lander—Genomics and Darwin in the 21st Century

Lander began by saying he wasn’t an evolutionist — an interestingly narrow definition of the term. He’s a fan of the research, but considers himself a biomedical geneticist, as if that was something different.

Having entire genomes of many species available for quantitative analysis is going to lead to a qualitative change in the science we can do.

He gave a pocket summary of the human genome project. Mouse genome followed, then rat and dog, and now have sequence (to varying degrees of completeness) of 44 species, out of 4600 mammals. Within Homo, there’s the hapmap project and the 1000 genomes project, so at least in us we’re going for depth and breadth of coverage.

Sequencing technology is rapidly accelerating. Exponential growth in the number of nucleotides sequenced per year. Exponentially on a log scale! We’re developing a tremendous amount of data acquisition capability. We’ll be able to address mechanisms of physiology and evolution, and learning about the particulars of history.

Lander focuses on genome-wide studies. Evolutionary conservation is a guide to extracting information from the genome. Showed synteny diagrams of mouse and human, and discussed analyses that allow you to identify highly conserved pieces, bits that might have significant function.

Number of genes is low, 20,500. Early higher numbers he admitted were inflated a bit by prior expectations; when they had a good estimate of 30,000, they decided to waffle and call it 30-40,000.

If genes are counted by homology, how do we know there aren’t many more genes that don’t have homology. If that were case, the number of genes in humans would still be close to the estimated numbers in chimp and macacque.

There are also well-conserved non-coding regions in DNA. 5% of the genome is under selection: coding 1.2%, non-coding 3.8%. Found 200 gene poor regions that contain key developmental genes, and many of the conserved non-coding regions are associated with them.

Long intergenic non-coding DNA: pretty much all of the genome is transcribed, but the vast majority of this is simply noise. There about a dozen regions known where transcription of non-coding DNA seem to be conserved evolutionarily, and have some function: they be transcriptional repressors.

Mechanism of evolutionary innovation in coding genes: examples of whole genome duplication, divergence and loss, all of which can be demonstrated by comparison with an outgroup. Outgroup comparisons can demonstrate whole genome duplications.

Mechanisms of innovation in non-coding regions: about 84% of conserved DNA is shared between marsupials and placentals, suggesting that about 16% of changes are novel. About 15% of placental specific CNEs are derived from transposons.

With 29 mammalian genomes compared, they have 4 substitutions per site, a detection limit of about 10 bp, and 2.8 million features detected. We have a lot of detail that can be extracted from the data sets.

We can find evidence of positive selection. Using chicken as an outgroup, we can identify genes that have undergone major changes in humans but not chimps. Comparision across 29 mammals shows even more. What we’re finding is that these evolutionarily significant genes are enriched for developmental genes.

Analysis within the human species shows that we are a young population that expanded rapidly from a small initial population of 10,000 individuals. Can now screen for associations between single-nucleotide polymorphisms and disease. We can now screen for 2 million polymorphisms in a single pass on a chip. Have now identified 500 loci associated with common traits. Most have very modest effects and only contribute to a small part of the heritability of the trait. Where is all the missing heritability? Missing loci, missing alleles, and non-additive effects of loci.

Positive selection in human history: can use hapmap data to find 300 regions with outlier distributions that suggest they have been the target of selection. Combining statistical tests narrows the specificity of identification to a size roughly equal to a single gene making it possible to identify specific genes with an interesting selective history (work in press by Pardis Sabeti). There are themes: many of these genes are involved in resisting infectious disease.

Genomics is experiencing an explosion of data that represents a huge opportunity for future discovery.