In developmental biology, and increasingly in evolutionary biology, one of the most important fields of study is deciphering the nature of regulatory networks of genes. Most people are familiar with the idea of a gene as stretch of DNA that encodes a protein in a sequence of As, Ts, Gs, and Cs, and that’s still an important part of the story. Most people may also be comfortable with the idea that mutations are events that change the sequence of As, Ts, Gs, and Cs, which can lead to changes in the encoded protein, which then causes changes in the function of the protein. These are essential pieces in the story of evolution; we do accumulate variations in genes and gene products over time.
There’s more to evolution than just that relatively straightforward pattern of change, however. Consider humans and chimpanzees. We’re both made of mostly the same stuff: the keratin that makes up our hair and the organization of hair follicles is nearly identical, and our brains each contain the same structures. The differences are in regulation. We both have the same kinds of hair, but chimps have more of it turned on all over the place, while we’ve mostly down-regulated it everywhere except a few places. The differences in our brains may be mostly differences in select timing: our brains are switched on to grow for longer periods of time in development, and there are almost certainly specific regions and patterns of connectivity that are tweaked by adjusting different levels of different gene products in different places at different times.
The really important bits of information that generate macroevolutionary differences are probably not the protein-encoding sequences of the genes, but the pieces of DNA that surround them (the regulatory elements that act as on/off switches for gene expression) and the molecular and cellular interactions that occur during development to change their status (the fancy word we use for that is epigenesis.) Another way to think of it is that much of the history of this new science of molecular biology has focused on puzzling out the spelling and vocabulary of the genome. The next step is to work out the genomic grammar. Proteins and their sequences are the words of the language, while organisms are whole novels—and obviously the differences between a Shakespeare and a Bulwer-Lytton aren’t so much in the words available to them, but in how they are arranged in a sentence, paragraph, and page.
You must understand that genes don’t stand alone, but every gene is an actor in a complex regulatory network. Each gene has a set of other genes that can influence whether that gene is off or on (these are called upstream elements.) A gene produces a protein product that affects multiple other genes (the downstream elements); this affect can be direct, if the gene is a regulatory gene itself, or indirect, if the gene product is part of a complex of cytoplasmic regulators. (I’m trying to keep this simple, or I’d go into greater detail on the fact that there are also multiple levels of regulatory interaction, that there is a great cloud of things called transcription factors that interact directly with DNA, and there is another great cloud of signal transduction factors and regulatory proteins working away in the cytoplasm.)
That long-winded introduction brings me to this paper by Maslov et al. (2004). The premise is straightforward: let’s look at a lot of genes, and in addition to comparing sequence similarity, let’s also compare the sets of upstream elements and the set of downstream elements and see how they differ. The authors are examining not just how the words of the language are spelled, but seeing if similar words are used in similar ways within sentences. In particular, they are looking at the results of gene duplication in evolutionary history. When a gene is duplicated, it is often going to also duplicate all of it’s regulatory elements; it is initially going to have all of the same upstream and downstream elements as the original copy. It can then diverge in its sequence, and also in what other genes regulate it and what downstream genes and proteins it affects. The question they are asking is, how fast does regulation change compared to sequence information? The answer is in the abstract:
Background:Gene duplication followed by the functional divergence of the resulting pair of paralogous proteins is a major force shaping molecular networks in living organisms. Recent species-wide data for protein-protein interactions and transcriptional regulations allow us to assess the effect of gene duplication on robustness and plasticity of these molecular networks.
Results:We demonstrate that the transcriptional regulation of duplicated genes in baker’s yeast Saccharomyces cerevisiae diverges fast so that on average they lose 3% of common transcription factors for every 1% divergence of their amino acid sequences. The set of protein-protein interaction partners of their protein products changes at a slower rate exhibiting a broad plateau for amino acid sequence similarity above 70%. The stability of functional roles of duplicated genes at such relatively low sequence similarity is further corroborated by their ability to substitute for each other in single gene knockout experiments in yeast and RNAi experiments in a nematode worm Caenorhabditis elegans. We also quantified the divergence rate of physical interaction neighborhoods of paralogous proteins in a bacterium Helicobacter pylori and a fly Drosophila melanogaster. However, in the absence of system-wide data on transcription factors’ binding in these organisms we could not compare this rate to that of transcriptional regulation of duplicated genes.
For all molecular networks studied in this work we found that even the most distantly related paralogous proteins with amino acid sequence identities around 20% on average have more similar positions within a network than a randomly selected pair of proteins. For yeast we also found that the upstream regulation of genes evolves more rapidly than downstream functions of their protein products. This is in accordance with a view which puts regulatory changes as one of the main driving forces of the evolution. In this context a very important open question is to what extent our results obtained for homologous genes within a single species (paralogs) carries over to homologous proteins in different species (orthologs).
The simple answer is that regulatory networks change relatively rapidly. They have measured a parameter, Ω or overlap, that defines how similar the upstream regulators of two genes are; in this cartoon to the left, the two paralogous genes share two upstream regulators out of five, and so have an Ω value of 2/5 or 0.4. In the graph below we can see that fraction of shared regulators changes faster than the sequence similarity.
What this means is that duplicated genes diverge rapidly at the level of the regulatory network, losing and adding different upstream regulators; this is the “upstream plasticity” referred to in the title of the paper. What about the downstream elements?
One slight flaw in the paper (acknowledged by the authors) is that their measures of upstream and downstream overlap aren’t entirely comparable—the upstream measure is of protein-DNA interactions, while downstream they are looking at protein-protein binding. Still, they see something interesting: while downstream regulation also changes rapidly, the proteins tend to retain a number of shared effects, even with significant changes in sequence. This maintained redundancy confers robustness on the network.
The work has promise for explaining some features of evolution.
Our results also indicate that the genetic regulation of paralogous proteins changes faster than both their amino acid sequences and the set of their protein interactions partners. It is tempting to extend this observation to pairs of homologous proteins in different species (orthologs) that diverged from each other as a result of a speciation (as opposed to a gene duplication) event. This would help to explain how species with very similar gene contents can evolve novel properties on a relatively short timescale. However, such an inter-species comparison of molecular networks has to wait for the appearance of whole-genome data on molecular networks in closely related model organisms.
Maslov S, Sneppen K, Eriksen KA, Yan K-K (2004) Upstream plasticity and downstream robustness in evolution of molecular networks. BMC Evolutionary Biology 4:9-21.