Hox genesis


One of the hallmark characters of animals is the presence of a specific cluster of genes that are responsible for staking out the spatial domains of the body plan along the longitudinal axis. These are the Hox genes; they are recognizable by virtue of the presence of a 60 amino acid long DNA binding region called the homeodomain, by similarities in sequence, by their role as regulatory genes expressed early in development, by the restriction of their expression to bands of tissue, by their clustering in the genome to a single location, and by the remarkable collinearity of their organization on the chromosome to their pattern of expression: the order of the gene’s position in the cluster is related to their region of expression along the length of the animal. That order has been retained in most animals (there are interesting exceptions), and has been conserved for about a billion years.

Think about that. While gene sequences have steadily changed, while chromosomes have been fractured and fused repeatedly, while differences accumulated to create forms as different as people and fruit flies and squid and sea urchins, while continents have ping-ponged about the globe and meteors have smashed into the earth and glaciers have advanced and retreated, these properties of this set of genes have remained constant. They are fundamental and crucial to basic elements of our body plan, so basic that we take them completely for granted. They determine that we can have different regions of our bodies with different organs and organization. Where did they come from and what forces constrain them to maintain their specific organization on the chromosome? Are there other genes that are comparably central to our organization?

Parts of the history of the Hox cluster have been reconstructed. The last common ancestor of insects and vertebrates would have had a bank of 7-9 genes. Later duplications in the protostome lineage would have expanded that to 8-9; vertebrates expanded the original cluster to about 14, and additionally duplicated the whole cluster multiple times. We mammals have 4 clusters, HoxA, HoxB, HoxC, and HoxD, each of which contains up to 14 genes (because we have 4 clusters, there is some redundancy, and individual genes within some of the clusters have been lost.) If we try to look farther back in our history, the best evidence so far suggests that the last common ancestor of insects, vertebrates, and flatworms probably had 4 Hox genes. Even further back, our last common ancestor with cnidarians had at least 2 Hox genes. The Hox genes themselves fall into 4 groups, the Anterior, Group 3, Central, and Posterior groups, that reflect the pre-Cambrian arrangement in our distant animal ancestor.

During evolution, large macro evolutionary events markedly altered the metazoan body plan and gave rise to the morphological diversity and complexity of current phyla. The cladogram shows the main metazoan groups and the associated body-plan transitions (indicated by red circles). The closest unicellular relatives of metazoans were the choanoflagellates; the question marks indicate uncertainty about the Hox gene complement in these evolutionary positions. The first body-plan transition in metazoans was the origin of radial symmetry, which gave rise, in the first instance, to cnidarians. The origin of bilaterality involved the generation of two body axes (anteroposterior and dorsoventral), the endomesoderm, and a nervous system that was concentrated at the anterior. Acoelomorphs (acoel flatworms) are the simplest bilateral representatives. Higher bilaterians (Eubilateria) are protostomes (arthropods, nematodes, annelids, molluscs and platyhelminthes, among other phyla) and deuterostomes (hemichordates, echinoderms and chordates). These two groups arose from the last eubilaterian ancestor, which had coelomic cavities, was segmented, and had a through-gut, an excretory system and brain ganglia. Non-vertebrate chordates (urochordates and cephalochordates such as amphioxus) did not undergo the metazoan transition that generated the vertebrate lineage, which is characterized by having neural-crest-derived tissues, vertebrae and high brain complexity. (See main text for alternative views on the origin of bilaterality.)

The figure illustrates one plausible explanation for the current composition of the homeobox (Hox) cluster in each main metazoan group. Hox genes are divided into four distinct classes: Anterior (A, and the corresponding paralogous groups 1 and 2, in purple), Group 3 (3, in yellow), Central (C, corresponding to paralogous groups 48 in deuterostomes and groups 4, 5, Antennapedia (Antp) and Ultrabithorax/abdominal A (Ubx/abdA) in protostomes, in green), and Posterior (P, P1/2 in protostomes, and paralogous groups 912 or 914 in deuterostomes, in red) genes. Cluster duplications in vertebrates are not indicated for simplicity. Red arrows indicate the inferred Hox gene composition at relevant morphological transitions. The Hox cluster of the common ancestor of cnidarians and bilaterians would have contained a member of each gene class. According to classical views, cnidarians secondarily lost the Group 3 and Central genes, and the origin of bilaterality was not coincident with an increase in the complexity of the Hox cluster.

An individual Hox gene is not unusual in any way. They are transcription factors, genes whose product binds to DNA and controls the levels of activity of other genes. There are many transcription factors. The Hox genes are part of a class of transcription factors that use a homeodomain to bind DNA, but there are also other homeodomain-containing transcription factors that are not within a Hox cluster. There are also other homeobox genes that share other similarities in their sequence, putting them in what is called the ANTP-class of homeobox genes. In an interesting development, though, some of these other ANTP-class homeobox genes have also been found to cluster in some lineages.

For instance, a separate bank of homeobox genes was identified in Amphioxus, and a similar bank was found in all chordates. This particular cluster arose long ago by a duplication of the primitive Hox cluster, and its genes are phylogenetically related to those in the more familiar Hox genes. Because it is paralogous to the Hox cluster, it is called ParaHox. This cluster has also undergone segmental duplications that parallel those of the Hox genes, so we vertebrates have a ParaHoxA, ParaHoxB, ParaHoxC, and ParaHoxD cluster. The individual genes are also present, but not clustered, in invertebrates—the idea is that whatever factor keeps the genes clustered is present in chordates, but was lost in invertebrates, so that their ParaHox genes have been gradually scattered about within their genome.

A founder ProtoHox-like gene produced, through a series of cis-duplications, an ancestral Hox-like cluster that consisted of the ProtoHox cluster linked to the ancestor of even-skipped homeotic gene (Evx) and mesenchyme homeobox (Meox). Segmental tandem duplications generated a continuous array of primordial ParaHox, Meox, Hox and Evx genes, which was subsequently broken between the posterior ParaHox gene caudal-type homeobox (Cdx) and the Meox gene (red arrow). Further cis-duplications and evolution by expansion and genome doublings led to the current mammalian complement of extended Hox and ParaHox genes, consisting of four clusters of each cluster type (A–D). A further gene cluster, EHGbox (not shown), exists that consists of gastrulation brain homeobox (Gbx), motor neuron restricted (Mnx) and engrailed (En). This cluster was created by cis-duplication of a founder gene that was probably adjacent to the ProtoHox gene. Colour codes for the four paralogous groups are: Anterior, purple; Group 3, yellow; Central, green; Posterior, red. Gsh, genomic screened homeobox; Xlox, Xenopus laevis homeobox 8.

There is yet another group, the NK cluster. This is a bank of homeobox containing genes that were first identified as clustering in Drosophila. Complementing the ParaHox genes, though, while their clustering has been conserved in insects, whatever property conserves that arrangement has been lost in vertebrates—we have homologs of the NK genes, but they are scattered about within our genome.

Model for the genesis of the ANTP-class homeobox megacluster. Early in animal evolution, an ANTP-class homeobox gene ProtoANTP (step 1) was duplicated, giving rise to a ProtoHox-like gene and a ProtoNK gene (step 2). Through a series of tandem duplications, each family expanded to form an NK cluster (green) and a series of Hox-like genes (other colours; steps 3–6), leading eventually (step 7) to the generation of the Hox and ParaHox clusters, including Evx and Meox (grey), and the extended Hox genes (light blue). The last step (step 8) shows the predicted composition of the megacluster in the last common ancestor of protostomes and deuterostomes. The arrows indicate the chromosomal breakages that split the megacluster into at least three pieces, giving three main genomic blocks that are known as the ParaHox, Hox, and NK regions. The red horizontal bars indicate the limits of the ParaHox, Hox and NK clusters. Steps 1 to 3 occurred before the radiation of sponges; the temporal order of steps 4 to 6 is unclear; step 7 might represent the last common ancestor of cnidarians and bilaterians; the labels below step 8 indicate the preferential place of expression of ParaHox, Hox and NK cluster genes. Cdx, caudal-type homeobox; Dlx, Distal-less homeobox; Emx, empty-spiracles homologue; En, engrailed; Evx, even-skipped homeotic gene; Gbx, gastrulation brain homeobox; Gsh, genomic screened homeobox; Lbx, ladybird-related homeobox; Meox, mesenchyme homeobox; Mnx, motor neuron restricted homeobox; Msx, muscle-specific homeobox; Tlx, T-cell leukaemia homeobox; Xlox/Ipf1, Xenopus laevis homeobox 8/insulin promoter factor 1; 114, Hox paralogous groups 1 to 14.

Before all this scattering took place, though, in our ancient ur-bilaterian ancestor, all of these genes must have been grouped together in a structure called the ANTP-class homeobox megacluster—the Hox genes, the ParaHox genes, and the NK genes, all laid out in one central zone full of ordered regulatory genes. What would these genes have been doing in that ancestor?

Here’s a provocative idea: 3 banks of patterning genes, and 3 primitive germ layers. Perhaps the duplication of the prototypical array of ANTP-class homeobox genes was an important step in the origin of triploblasty! The expression of the Hox genes is mainly associated with ectodermal derivatives—in particular, the neurectoderm. ParaHox genes are expressed mainly in the endoderm, or gut derivatives. The NK genes, at least in Drosophila, are associated with mesodermal patterning. There are exceptions to all of these generalizations, but that isn’t too surprising in genes with such a long history, but the overall pattern is incredibly suggestive.

This hypothesis makes diploblastic (having only two germ layers) particularly interesting. The diploblastic cnidarian Nematostella has representatives of all 3 groups—Hox, ParaHox, and NK—in its genome, which would imply that it may have had a triploblastic ancestor. That idea isn’t so surprising anymore, since studies of the development of Nematostella have shown that it initially has bilateral symmetry and a staggered pattern of Hox gene expression—suggesting that cnidarians evolved from bilaterians, secondarily losing bilateral and gaining radial symmetry. If we want to get a picture of what happened at the boundary between diploblasty and triploblasty, we need to push back a little further in phylogeny…but the next simpler grade of animals are the sponges, and they’ve got some weird and surprising issues of their own.

Sponges might hold the answer. They are classically considered to be simple metazoans, with no obvious symmetry. The ANTP-class homeobox complement of sponges is currently limited to a few NK-like genes, whereas no Hox or ParaHox gene has been found. It is tempting to place the genesis of the megacluster at the sponge-cnidarian divergence, right at the origin of bilaterality and triploblasty (assuming that cnidarians are bilaterians and triploblastic). Still, if this reasoning holds true, why do sponges have several NK genes, and no Hox or ParaHox gene? The fact that no Hox or ParaHox gene has been identified could simply be due to the technical difficulty of finding old, divergent or embryonically expressed sequences, or to the choice of non-basal species. Nevertheless, the primitiveness of sponges is currently under debate. It is unclear whether sponges form a monophyletic sister group to the rest of the metazoans, or are paraphyletic. As with cnidarians, evolutionary insights are being gained from studies of embryonic development. Sponge larvae are architecturally closer than adult sponges to other metazoans. One possibility is that other metazoans (including cnidarians) evolved from a neotenous larva of ancient sponges, and that sponges were, in fact, the simplest bilaterian metazoans, having a single or only a few Hox-like genes and several NK genes. This would imply that no basal non-bilaterian animals currently exist. An intriguing exception might be the placozoans, an enigmatic group that have no symmetry and a Hox-like gene that might resemble an ancestral Hox or ParaHox gene.

Whoa. This would be a surprising and rather sad discovery—perhaps all extant animals are primitively triploblasts, with a few lineages that have secondarily lost bilaterality and germ layers, with only the forlorn placozoa as the very last holdout of the ancient diploblast line. It has me wondering if there wasn’t a widespread war in pre-Cambrian ages, gelatinous sheets of diploblastic slime (the Vendian biota?) against upstart triploblastic worms…and the triumph of the triploblasts was complete, or nearly so.

So we have to push back to even older forms than the sponges to find the boundary. The next step back, though, is to the choanoflagellates, single celled organisms that are descendants of that last precursor to multicellularity. They have some of the signalling genes of the metazoa, but there’s one thing they completely lack: any trace of the ANTP-class homeobox genes. Even here, though, Garci-Fernàndez raises the possibility that choanoflagellates are also degenerate sponges that have abandoned multicellularity and secondarily lost their ANTP genes.

It’s all a very tangled mangle of genes, but the information we have does let us tease out a little bit of metazoan history.

Early in
metazoan evolution, the founder member of the
ANTP class of homeobox genes underwent a series
of cis-duplications, generating an extensive array of
homeobox genes that included the extended Hox,
ParaHox and NK clusters. The clusters followed distinct
evolutionary pathways in different lineages, with splits,
losses and dispersions around the genome, ending with
compact Hox and ParaHox clusters in vertebrates and
compact NK clusters in insects. The early steps for the
genesis of the megacluster are uncertain, and most prob-
ably occurred early in metazoan evolution—certainly
before the cnidarian-bilaterian split. It is also tempting
to imagine that the origin, diversification and increasing complexity of the three germ layers coincided with the
generation, and expansion, of the three ANTP clusters.
The apparent paradox that the clusters appeared before
the ‘radial’ cnidarians might be explained if, as new
molecular data indicate, cnidarians were primitively
bilateral and triploblastic.

One other complicated issue is why these blocks of genes have survived relatively intact for so long in our history. One suggestion is that the organization is a necessary prerequisite for their proper regulation; there are regulatory elements identified in at least one of the Hox clusters that controls the timing of their expression—in addition to being spatially collinear, they’re also temporally collinear, activating sequentially from front to back, and from early to late. Also, not all Hox clusters are well maintained: in Drosophila, for instance, the Hox cluster is broken into two sections, and in Oikopleura dioica, they are scattered all over the place. Those organisms with broken clusters also tend to be very rapidly developing and activate all of their Hox genes nearly simultaneously, and have evolved novel regulatory mechanisms that are independent of a whole-cluster timer. That frees them from the tyranny of the developmental clock, and removes the constraint that the genes must be localized together…so they are slowly breaking apart. Give them a few tens or hundreds of millions of years, and the tidy Hox neighborhood will have drifted apart in these organism’s descendants.

…and there also lies a problem. If we’re interested in the evolutionary history of the Hox genes, the place where their relationships will have been most thoroughly preserved will be in slow developing organisms, ones that still use the old sequential pattern of gene activation. These are precisely the organisms that developmental biologists have always avoided—who wants to wait months to see the result of a developmental perturbation in one animal, when you can get the answer in hours if you use a rapidly developing animal? The irony here is that it’s the large, complicated, elaborate animals—like us—that are likely to have retained the most primitive arrangements and control elements of fundamental metazoan genes.

Garcia-Fernàndez J (2005) The genesis and evolution of homeobox gene clusters. Nature Reviews Genetics 6:881-892.


  1. Mesk says

    Nice post, PZ. Minor point: the link to your Oikopleura dioica post is broken.

  2. Titus Brown says

    It’s almost certainly not a billion years; dunno where you get that number! (The last place I saw it mentioned was the C. briggsae publication a few years ago.)

    1 bn is based on molecular evolution guessing. The fossil record supports a last common ancestor prior to the Cambrian, so ~540 mya.