A cautionary tale on reading phylogenetic trees


I have written before about the perils of naive interpretations of phylogenetic trees (“Extant taxa cannot be basal“). Others, notably Krell & Cranston and Crisp & Cook, have pointed out that this is not just a language issue; such misreadings can cause substantive problems in the way evolutionary history is understood.

A new paper in PLoS ONE, “A tree of life based on ninety-eight expressed genes conserved across diverse eukaryotic species,” contains several instructive examples. PLoS ONE is open access, so you can read the original paper without an institutional subscription. A tweet by Frederik Leliaert got this paper on my radar, and it piqued my interest because of the startling observation that the inferred phylogeny shows Chlamydomonas as sister to all other eukaryotes.

It made me frown, too.

The paper does make reference to Chlamydomonas‘ phylogenetic position in the Results:

Two other protist species, P. infestans and T. gondii, are closer to C. reinhardtii, which itself is a unicellular green algae located as an outermost group in the Bayesian tree.

First of all, P. infestans and T. gondii are not closer to C. reinhardtii according to the inferred phylogeny (if you don’t agree with this, you need to read the Tree-thinking paper). They diverged from Chlamydomonas at the same point everything else in the tree did, at the very first divergence.

More importantly, though, Chlamydomonas being sister to the other species is not a result at all; it’s an assumption:

The aligned composite gene sequences of the 49 species were analysed and C. reinhardtii, the most common ancestor of plant and animal species, was selected as the outgroup.

Whatever you choose as an outgroup in a phylogenetic analysis will always show up as sister to the other species; that’s what outgroup means. Why they chose Chlamydomonas as an outgroup is a mystery to me. I assume ‘most common ancestor’ means ‘most recent common ancestor’, but there is no sense in which Chlamydomonas reinhardtii is ancestral to either plants or animals, and certainly not both. No reference is given in support of choosing Chlamydomonas as an outgroup, and I’m not aware of any that would support it.

Here is the inferred phylogeny:

Figure 7 from Jayaswal et al. 2017.

Figure 7 from Jayaswal et al. 2017. Eukaryotic tree of life. A rooted eukaryotic phylogenetic tree based on concatenated sequences of 98 rice gene homologs conserved across 49 eukaryotic species using Bayesian approach (Mrbayes v 3.2). Bayesian posterior probability for each node is 1. Tree was rooted using Chlamydomonas reinhardtii (Green algae) sequence.

The first problem is in the way the authors chose to represent the tree. A phylogeny that includes a time scale is called a chronogram, and such trees are nearly always represented as ultrametric, i.e. all of the terminal branches end at the same time point. There is good reason for this convention: terminal branches represent living (extant) species, which, by definition, exist zero years ago. A literal reading of the tree above would be that every species except Dictyostelium discoideum has gone extinct. The Chlamydomonas reinhardtii branch, for example, is shown ending 1.3 billion years ago. I am confident that Chlamydomonas is not extinct.

But let’s leave that aside. Maybe the only chronological information we’re intended to take from Figure 7 is divergence times. That doesn’t work, either. Chlamydomonas diverging from land plants 1.5 billion years ago would be somewhat surprising, but I wouldn’t bet my life that it’s wrong. Humans diverging from chimpanzees 500 million years ago, on the other hand, is not remotely plausible; that’s about twice the age of mammals. The nematode C. elegans is shown diverging from the remaining animals over a billion years ago, which is a couple of hundred million years before the origin of animals by most estimates.

The molecular clock Methods are hard to follow, but there are clearly problems with the fossil calibrations:

Table 3 from Jayaswal et al. 2017. Divergence times of 50 sampled pairs of species out of total 1,176 pairs of species analysed.

Look at the calibration times (second column) that involve C. reinhardtii: 968 million years ago (Ma) versus Oryza sativa (rice), 1500 Ma vs. Physcomitrella patens (moss), 1547 Ma vs. Dictyostelium discoideum (cellular slime mold), 1642 Ma vs. Phytophthora infestans (potato blight), 700 Ma vs. Anopheles gambiae (mosquito), 1547 Ma vs. Homo sapiens. Problem is, if Chlamydomonas is the outgroup, then all of these species diverged from Chlamydomonas at the same time, by definition. It is impossible that Chlamy diverged from humans 1547 Ma and mosquitos 700 Ma: both are animals and diverged from Chlamy at the same time.

The inferred divergence times are problematic, too, to the point that they should have caused the authors and the reviewers to question the whole analysis. I’ll just hit a few of the highlights: For the divergence between Strongylocentrotus purpuratus (purple sea urchin) and Danio rerio (zebrafish), the inferred divergence is 124 million years ago (surprise, PZ: zebrafish is an echinoderm!). That is not remotely possible; we have fossils of both fishes and sea urchins over 400 million years old. The divergence between humans and cows was estimated at 32 million years ago, at least 25 million years after the earliest primate fossils. Unless cows are primates, they can’t have diverged from humans after primates diverged from other animals. Their estimated divergence between humans and gorillas is zero; I won’t belabor why we know that’s wrong.

Perhaps the most obviously silly inference is for the divergence between the crop plant Sorghum and the freshwater cnidarian Hydra: 110.43 million years. That’s right, a Cretaceous divergence between animals and plants. When the fossil record contradicts your estimates by half a billion years, you have done something wrong.

But even if we take the inferred tree as a given, the authors badly misread it. Again, I’ll limit myself to a few examples of the many available:

The combined log values suggest that the evolution time for the unicellular green alga C. reinhardtii is 1401 Ma, in the middle of the Proterozoic era (900–1600 Ma), which corresponds to the earliest known fossil records [33, 102].

First of all, saying that ‘the evolution time…is 1401 Ma’ is meaningless. The only kind of time that can be estimated using a phylogenetic molecular clock analysis is divergence times, so we have to ask, 1401 Ma represents Chlamy’s divergence from what? From the rest of the tree? That’s the only divergence time the tree allows us to infer that’s relevant to Chlamy. Their inferred divergence times between Chlamy and other species (which should all be the same) range from 139-1194 Ma. And none of those times, even if they were right, would tell us the age of Chlamydomonas reinhardtii.

Also, what ‘earliest known fossil records’ are they referring to? There are no known Chlamydomonas fossils.

The grouping of protists within diverse clades emphasises their broad distribution and close association with the 3 eukaryotic clades—plants, animals, and fungi. In particular, S. rosetta provided a link between fungal and animal species…

No, it doesn’t. S. rosetta is sister to the animals. It it exactly as distantly related to the fungi as are all the animals:

Jayaswal et al. 2017 Figure 7

Take a close look at that figure. The red circle represents the divergence between S. rosetta and the fungi. It also represents the divergence between humans and fungi, between cows and fungi, and between mosquitos and fungi. S. rosetta is exactly as closely related to fungi as animals are.

T. gondii provides a link between fungal and plant species…

Again, and for the same reasons, no. T. gondii diverged from plants at the same time the animals and fungi did, at the blue circle.

…and C. reinhardtii is the nearest to the plant clade.

Not according to that tree. Chlamydomonas is exactly as far from plants as it is from animals and fungi, according to Figure 7 (black circle). Note that I’m not saying they’re wrong about Chlamy being closely related to plants. I’m saying they’re misreading their own phylogenetic tree.

Among the 6 mammals, mice were closer to the base of the tree and are most closely related to cow, which in turn, is closer to primates than to mice.

No. Mice are not closer to the base of the tree, and they are not most closely related to cow. They are sister to (cow + primates), which means they are equally closely related to cows and to primates (green circle).

By contrast, among the 4 primates, humans are most closely related to chimpanzees.

No. Orange circle. As with Chlamy, I’m not saying the statement is wrong, I’m saying it contradicts their tree. Figure 7 says that humans are sister to the clade chimpanzee+gorilla+orangutan, i.e. that we are equally distant from all three.

Some of these problems are just rhetorical, but some of them are substantive, and this is the real problem. A failure to understand that phylogenies represent sister group relationships has led to incorrect interpretations of evolutionary relationships, such as that the outgroup is more closely related to one ingroup clade than another, that the sister of one clade is a ‘link’ to another clade, and that a single branching event can have a bunch of different divergence times.

It’s unfortunate, because the idea motivating the analyses is a good one (if not a totally novel one). We know that single-gene phylogenies often mislead, and multi-gene phylogenies are much more likely to reflect true evolutionary relationships. As it stands, I’m not sure there’s anything useful here. Maybe the identification of conserved genes is useful, but the phylogeny and divergence time estimates certainly aren’t.

The bottom line is that the reviewers and/or the PLoS ONE editors failed the authors. This paper, while it represents a potentially useful approach, should, at a minimum, have been sent back for major revisions. The choice of Chlamydomonas as an outgroup is contrary to everything we know about eukaryote phylogeny, and the authors’ misinterpretations of their own trees should never have survived peer review.

 

Stable links:

Crisp MD and Cook LG. 2005. Do early branching lineages signify ancestral traits? Trends Ecol. Evol. 20:122–8. doi: 10.1016/j.tree.2004.11.010

Jayaswal, P. K., V. Dogra, A. Shanker, T. R. Sharma, and K. Singh. 2017. A tree of life based on ninety-eight expressed genes conserved across diverse eukaryotic species. PLoS ONE 12:e0184276. doi: 10.1371/journal.pone.0184276

Krell F-T, Cranston PS. 2004. Which side of the tree is more basal? Syst. Entomol. 29:279–81. doi: 10.1111/j.0307-6970.2004.00262.x

Omland KE, Cook LG, Crisp MD. 2008. Tree thinking for all biology: the problem with reading phylogenies as ladders of progress. BioEssays 30:854–67. doi: 10.1002/bies.20794

Comments

  1. Ellis says

    And as we all were taught in school there are only three Kingdoms: Plants, Animals and Fungi. Excavates? Never heard of them. Amoebae? Nah. SAR? We don’t need to bleedin’ SAR. Also there are only green archaeplastida, as we all know. They didn’t even start by trying get a good survey across the tree of life.

  2. says

    Just a guess, but I wonder if the authors read the Chlamydomonas genome paper and extrapolated an erroneous conclusion about using Chlamydomonas as an outgroup from the title “The Chlamydomonas Genome Reveals the Evolution of Key Animal and Plant Functions” .

    One silver lining of having this PLoS ONE study published is that when combined with your blog it can provide a nice set of teaching material (i.e. mistakes to avoid) for students learning phylogenetics.

  3. Nagendra says

    There are always problems associated with the interpretation of the tree of life. We always get discrepancies no matter how well we optimise the parameters. We have reported the results in an unbiased way as we got it. Selection of C. reinhardtii is valid as also indicated in the original C. reinhardtii genome. However, we acknowledge some important points have been raised that will be addressed in due course of time, particularly we should have started all these extanct species at time zero for ease of viewing.

    • Matthew Herron says

      Here’s what the Chlamy genome paper says about phylogenetic relationships:

      The Chlorophytes (green algae, including Chlamydomonas and Ostreococcus) diverged from the Streptophytes (land plants and their close relatives) over a billion years ago. These lineages are part of the green plant lineage (Viridiplantae), which previously diverged from opisthokonts (animals, fungi, and Choanozoa)

      This is incompatible with C. reinhardtii as an outgroup.

  4. Pawan Kumar Jayaswal says

    Chlamydomonas reinhardtii (Cri) is a single cell green algae which retain the common features of plant (chloroplast-based photosynthesis) and animal (eukaryotic flagella). Merchant et al. (2007) in Chlamydomonas genome sequencing project mentioned about the divergence of its lineage from land plants over billion of years ago. Similarly, Yoon et al. (2004) estimated the split of red and green algae occurred about 1500 Mya. Outgroup which we have selected based on the above information and selected species is distantly related with the in-group species. We have included twenty animal, seven fungi and four protista species in our sample data. Concatenated multigene based Bayesian phylogenetic tree showed the two protista species intermediate between the animal and fungi similar result based on limited number of genes (EF-1a, actin, b-tubulin, and HSP70, and/or a-tubulin) reported by Steenkamp et al. in 2005 [nucleariid a group of amoeba appears as the closest sister taxon to fungi and choanoflagellates group with sister lineage of animal (Hedges et al. 2004)]. Our 98 gene based phylogenetic tree analysis clearly aligned with the published report and at the same time we have not claimed in our paper as Chlamydomonas reinhardtii is the origin, on the basis of above literature we selected Cri as a outgroup. As far as the concern related with the extant taxa as a basal group, we agree with the Krell & Cranston (2004) and Crisp & Cook (2005) up to some extent, whose explanation is related to particular class of species, however Stacey Smith (http://for-the-love-of-trees.blogspot.in/2016/09/the-ancestors-are-not-among-us.html) clarifies in her Sep 20, 2016 blog comments about the interpretation of tree in which samples are from different kingdom and our explanation is consistent with that but we will definitely pay greater attention to this interpretation of the phylogenetic tree in futuristic work. However, number of research paper has been reported who has considered the extant taxa as a outgroup and addressed the different question (Yin et al. 2015; Neuvéglise et al. 2002, de la Chaux and Wagner 2009).
    We have reported the divergence time for 1176 pair of species based on synonymous substitution values using the molecular clock of Muse and Weir (1992) in which we incorporated the fossil information. The evolutionary distance among the different species is based on the above method and it is not taken from the phylogenetic tree scale axis. However, our objective with phylogenetic tree was to show the evolutionary relationship among the species and not the divergence time. At this point I would like to say the presented tree is only a phylogram not a chronogram.
    As we have said in our earlier comment and would like to mention this again that no matter how well we optimise the parameters there is a chance of some discrepancies in the divergence time and the study has provided a broad perspective about the evolutionary time frame not the exact dates which are almost impossible to estimate.However, we acknowledge that some important points have been raised that could be addressed in due course of time, particularly we could have started in our phylogram all these extanct species at time zero for ease of viewing to make our tree also a chronogram and just a phylogram.

    Merchant SS et al. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science. 2007 Oct 12;318(5848):245-50.
    Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D. A molecular timeline for the origin of photosynthetic eukaryotes. Mol Biol Evol. 2004;21(5):809-18. doi: 10.1093/molbev/msh075. PubMed PMID: 14963099.
    Hedges SB, Blair JE, Venturi ML, Shoe JL. A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol. 2004;4:2. doi: 10.1186/1471-2148-4-2. PubMed PMID: 15005799; PubMed Central PMCID: PMCPMC341452.
    Krell, F.-T. and Cranston, P. S. (2004), Which side of the tree is more basal?. Systematic Entomology, 29: 279–281. doi:10.1111/j.0307-6970.2004.00262.x
    Crisp MD, Cook LG. Do early branching lineages signify ancestral traits? Trends Ecol Evol. 2005 Mar;20(3):122-8. Epub 2004 Dec 13.
    Yin H, Du J et al. 2015. Genome-wide Annotation and Comparative Analysis of Long Terminal Repeat Retrotransposons between Pear Species of P. bretschneideri and P. Communis. Sci Rep. 2015 Dec 3;5:17644. doi: 10.1038/srep17644.

    Neuvéglise C, Feldmann H et al. 2002. Genomic evolution of the long terminal repeat retrotransposons in hemiascomycetous yeasts. Genome Res. 2002 Jun;12(6):930-43.

    de la Chaux N, Wagner A.2009. Evolutionary dynamics of the LTR retrotransposons roo and rooA inferred from twelve complete Drosophila genomes. BMC Evol Biol. 2009 Aug 18;9:205. doi: 10.1186/1471-2148-9-205.

    Muse SV, Weir BS. Testing for equality of evolutionary rates. Genetics. 1992;132(1):269-76. PubMed PMID: 1398060; PubMed Central PMCID: PMCPMC1205125.

Trackbacks

  1. […] tree for eukaryotes was published in a form that never should have survived peer review (“A cautionary tale on reading phylogenetic trees,” “PLoS ONE responds“). The article contains numerous misinterpretations of the […]

Leave a Reply to Ellis Cancel reply