Gene regulatory networks and conserved noncoding elements


We miss something important when we just look at the genome as a string of nucleotides with scattered bits that will get translated into proteins — we miss the fact that the genome is a dynamically modified and expressed sequence, with patterns of activity in the living cell that are not readily discerned in a simple series of As, Ts, Gs, and Cs. What we can’t see very well are gene regulatory networks (GRNs), the interlinked sets of genes that are regulated in a coordinated fashion in cells and tissues.

What this means is that if you look within a specific cell type at a specific gene, its state, whether off or on, will be correlated in a coherent way with a set of other genes. Look in a developing muscle cell, for instance, and you’ll typically find a gene called MyoD is switched on, and also other genes, like Myf5 and myogenin. Look further, and you’ll find others like C-jun and cyclin-dependent kinase 4, that also have their activity modulated in predictable ways. And when we start poking around experimentally, we discover that the relationships are often directly causal, with certain gene products binding to and modifying the expression of other genes.

Imagine that, instead of scanning a dead series of nucleotides to find genes, we were able to go fishing in the living cell and pull up a gene of interest, and we then also pull up all the genes to which it is linked by regulatory processes — catch one, and you’d pull up a whole collection of genes from scattered places in the genome, the gene regulatory network. GRNs are particularly interesting in studies of questions about phenotype, because they, not single genes, are actually the fundamental unit of the cell and tissue type. What makes a liver isn’t one gene, but a whole suite of coordinated genes.

How do we go fishing for GRNs? It isn’t trivial. One way is to focus on those single genes and do detailed studies of each, tracking levels of expression over time and space within the organism. Another is to use something like a microarray to sample a lot of genes in one tissue all at once, and ask which are up-regulated and which are down-regulated. Another is to look at epistasis directly, tinkering with one gene or its product genetically or biochemically, and asking what happens to the other genes in the genome. This is all a lot of work, and it’s ongoing…but we don’t have an automated shortcut to zip through a whole genome and identify all the connections between the genes.

There are some hints, though — conserved spots in the genome that are not part of the coding sequence of a gene, but are part of the regulatory control region. Here’s an example of a plot of conserved sequences for one gene, SALL3. Each line is a comparison of the sequence between two species: H/F compares human to the fish, fugu; M/F compares mouse to fugu; and C/F compares chicken and fugu. The comparison starts 71,000 bases upstream of the SALL3 gene, and the vertical scale measures the degree of sequence conservation at each point; the red areas are non-coding regions, and blue is in the actual coding region of the SALL3 gene.

CNEs cluster in genomic regions surrounding developmental regulator y genes in ver tebrate and nematode genomes. Here, we show sequence conservation in the SALL3/sem-4 locus. This gene codes for a zinc finger protein that is involved in embryonic development in both
vertebrates and invertebrates. The diagram shows sequence conservation (minimum 50% identity in 100 bp sliding windows) along the
SALL3 gene and part of its upstream sequence in human, mouse, chicken, and pufferfish (Fugu) genomes. Conservation is shown with respect to
the Fugu sequence in the region spanning from 71 k upstream of the SALL3 transcription start site to the end of the gene.

See all those pink peaks in the chart? Those ought to jump out at you. Those are highly conserved non-coding elements (CNEs), and the remarkable thing is that they are nearly identical in fish, mammals, and birds — these elements have been conserved for 450 million years. The evidence suggests that these CNEs have been under strong purifying selection, and that they are 300 times less likely to be lost than the nondescript junk in between them. These CNEs are all over the place, and are particularly strongly associated with regulatory genes, such as transcription factors.

Another interesting point: the figure above only shows comparisons within a single phylum, and we see CNEs all over. What if we look in other phyla? It turns out that CNEs are not unique to us chordates, and are found in invertebrates as well, but they are different. Compare homologous genes in a fugu and a fly, and there is little detectable conservation within these non-coding regions; compare a couple of different arthropods, or a couple of species of nematode, and there they are again…but they’re different CNEs, a unique set for arthropods, and a different set for chordates.

So they’re conserved within a broad group of organisms, and you might be wondering what they do. In the known cases, the CNEs seem to be transcriptional enhancers — they promote more consistent, robust expression of the associated gene in appropriate tissues. The mechanism isn’t known for sure, but likely involves these regions having sequences that promote the binding of transcription factors that strengthen the regulation of the gene. We know of cases of mutations in the CNEs (not the gene itself, but just in these little patches of non-coding DNA) that can lead to birth defects in human beings. Certain kinds of polydactyly and syndactyly, and a rather obscure heritable disease called Pierre Robin syndrome seem to be the result of changes entirely in CNEs.

These observations are all very suggestive. It suggests that the CNEs are part of genetic circuits that set up patterns of gene expression — the patterns that regulate organismal form. We don’t know the specific details of those interactions, but we see the evidence of their existence.

One way to think of it is in comparing it to the functioning of a circuit board in a computer. You know that what is essential for the appropriate activity in the computer is that everything is wired up correctly — these gene regulatory networks are analogous to the circuit networks in electronics. In this case, we don’t know the details of the wiring — we’ve just begun to trace all the connections — but what we have identified are sets of printed circuit board connectors, small keyed blocks that hook up in a specific way to wires with matching plugs. We’re at the point where we’ve cataloged some interesting order: these sets of circuits use a 15-pin connector, that one over there has 3 pins on the board, so we need to find a cable with a 3-socket plug, etc. CNEs mark the spot in the DNA where connections are made between living genes.


What this all implies, because we see similar arrangements of these CNEs within vertebrates, is that the broad patterns of connectivity have been conserved for a very long time — since shortly after the Cambrian, at the very least. That phyla that diverged before or during the Cambrian have different patterns of CNEs implies that at that time of divergence, either these regulatory elements were less significant or more fluid. Together, these observations provoke the idea that maybe one of the factors that led to the emergence and stabilization of different body plans was the gradual reinforcement of fixed patterns of gene activity by the addition of these elements over time; CNEs contributed to the canalization of development.

An early, relatively simple set of genes with a small number of regulatory elements connecting them expanded by gene duplication…which also, of course, expanded the number of possible connections between them. This expansion was followed by a consolidation, where the connections between genes were selectively whittled down, but different lineages stabilized on different, smaller subsets of that huge combinatorial space. The interesting idea here is that diversification would proceed hand-in-hand with increasing stabilization, locking in different lineages to different body plans.

A model for the evolution of cis-regulatory elements involved in animal development. According to this model, duplication and rewiring of the regulatory toolkit of the common animal ancestor gave rise to a diverse set of complex regulatory elements that formed the core
developmental programs of the major animal phyla. Since then, animal body plans have been largely conserved. This conservation may be
reflected in a set of highly conserved cis-regulatory elements controlling the expression of developmental genes.

This leads to the situation we see today, where we see the same parts, the same genes in animals as different as fruit flies and humans, but how those parts are deployed and regulated and used in relationship to other genes may be very different and produce very different morphological outcomes. CNEs are a subtle part of the regulatory landscape that played an important role in marshaling patterns of core gene expression.

Vavouri T, Lehner B (2009) Conserved noncoding elements and the evolution of animal body plans. BioEssays 31:727-735.