8  Visual Exploration of Phylogenetic Trees

library("ape")
library("ggplot2")
library("cowplot")
library("ggtree")
ggtree v4.1.1.004 Learn more at https://yulab-smu.top/contribution-tree-data/

Please cite:

S Xu, Z Dai, P Guo, X Fu, S Liu, L Zhou, W Tang, T Feng, M Chen, L
Zhan, T Wu, E Hu, Y Jiang, X Bo, G Yu. ggtreeExtra: Compact
visualization of richly annotated phylogenetic data. Molecular Biology
and Evolution. 2021, 38(9):4039-4042. doi: 10.1093/molbev/msab166

Attaching package: 'ggtree'
The following object is masked from 'package:ape':

    rotate
library("yulab.utils")
library("kableExtra")
expand <- ggtree::expand
rotate <- ggtree::rotate
flip <- ggtree::flip

The ggtree (Yu et al., 2017) supports many ways of manipulating the tree visually, including viewing selected clade to explore large tree (Figure Figure 8.1), taxa clustering (Figure Figure 8.5), rotating clade or tree (Figure Figure 8.6 (b) and Figure 8.8), zoom out or collapsing clades (Figure Figure 8.3 (a) and Figure 8.2), etc.. Details of the tree manipulation functions are summarized in Table Table 8.1.

Table 8.1: Tree manipulation functions.
Function Description
collapse Collapse a selecting clade
expand Expand collapsed clade
flip Exchange position of 2 clades that share a parent node
groupClade Grouping clades
groupOTU Grouping OTUs by tracing back to the most recent common ancestor
identify Interactive tree manipulation
rotate Rotating a selected clade by 180 degrees
rotate_tree Rotating circular layout tree by a specific angle
scaleClade Zoom in or zoom out selecting clade
open_tree Convert a tree to fan layout by specific open angle

8.1 Viewing Selected Clade

A clade is a monophyletic group that contains a single ancestor and all of its descendants. We can visualize a specifically selected clade via the viewClade() function as demonstrated in Figure Figure 8.1 (b). Another solution is to extract the selected clade as a new tree object as described in session 2.5. These functions are developed to help users explore a large tree.

library(ggtree)
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
p <- ggtree(tree) + geom_tiplab()
viewClade(p, MRCA(p, "I", "L"))

Viewing a selected clade of a tree. An example tree used to demonstrate how r Biocpkg("ggtree") supports exploring or manipulating phylogenetic tree visually (A). The r Biocpkg("ggtree") supports visualizing selected clade (B). A clade can be selected by specifying a node number or determined by the most recent common ancestor of selected tips.
(a) Example tree used to demonstrate visual exploration
(b) Visualizing a selected clade
Figure 8.1: Viewing a selected clade of a tree.

Some of the functions, e.g., viewClade(), work with clade and accept a parameter of an internal node number. To get the internal node number, users can use the MRCA() function (as in Figure Figure 8.1) by providing two taxa names. The function will return the node number of input taxa’s most recent common ancestor (MRCA). It works with a tree and graphic (i.e., the ggtree() output) object. The tidytree package also provides an MRCA() method to extract information from the MRCA node (see details in session 2.1.3).

8.2 Scaling Selected Clade

The ggtree provides another option to zoom out (or compress) selected clades via the scaleClade() function. In this way, we retain the topology and branch lengths of compressed clades. This helps to save the space to highlight those clades of primary interest in the study.

tree2 <- groupClade(tree, c(17, 21))
p <- ggtree(tree2, aes(color=group)) + theme(legend.position='none') +
  scale_color_manual(values=c("black", "firebrick", "steelblue"))
scaleClade(p, node=17, scale=.1) 
Figure 8.2: Scaling selected clade. Clades can be zoomed in (if scale > 1) to highlight or zoomed out to save space.

If users want to emphasize important clades, they can use the scaleClade() function by passing a numeric value larger than 1 to the scale parameter. Then the selected clade will be zoomed in. Users can also use the groupClade() function to assign selected clades with different clade IDs which can be used to color these clades with different colors as shown in Figure Figure 8.2.

8.3 Collapsing and Expanding Clade

It is a common practice to prune or collapse clades so that certain aspects of a tree can be emphasized. The ggtree supports collapsing selected clades using the collapse() function as shown in Figure Figure 8.3 (a).

p2 <- p %>% collapse(node=21) + 
  geom_point2(aes(subset=(node==21)), shape=21, size=5, fill='green')
p2 <- collapse(p2, node=23) + 
  geom_point2(aes(subset=(node==23)), shape=23, size=5, fill='red')
print(p2)
expand(p2, node=23) %>% expand(node=21)
(a) Collapsing selected clades
(b) Expanding collapsed clades back
Figure 8.3: Collapsing selected clades and expanding collapsed clades.

Here two clades were collapsed and labeled by the green circle and red square symbolic points. Collapsing is a common strategy to collapse clades that are too large for displaying in full or are not the primary interest of the study. In ggtree, we can expand (i.e., uncollapse) the collapsed branches back with the expand() function to show details of species relationships as demonstrated in Figure Figure 8.3 (b).

Triangles are often used to represent the collapsed clade and ggtree also supports it. The collapse() function provides a “mode” parameter, which by default is “none” and the selected clade was collapsed as a “tip”. Users can specify the mode to “max” that uses the farthest tip (Figure Figure 8.4 (a)), “min” that uses the closest tip (Figure Figure 8.4 (b)), and “mixed” that uses both (Figure Figure 8.4 (c)).

p2 <- p + geom_tiplab()
node <- 21
collapse(p2, node, 'max') %>% expand(node)
collapse(p2, node, 'min') %>% expand(node)
collapse(p2, node, 'mixed') %>% expand(node)

We can pass additional parameters to set the color and transparency of the triangles (Figure Figure 8.4 (d)).

collapse(p, 21, 'mixed', fill='steelblue', alpha=.4) %>% 
  collapse(23, 'mixed', fill='firebrick', color='blue')

We can combine scaleClade with collapse to zoom in/out of the triangles (Figure Figure 8.4 (e)).

scaleClade(p, 23, .2) %>% collapse(23, 'min', fill="darkgreen")  
(a) Collapse mode max
(b) Collapse mode min
(c) Collapse mode mixed
(d) Set color, fill, and alpha
(e) Combine scaleClade() with collapse()
Figure 8.4: Collapse clade as a triangle.

8.4 Grouping Taxa

The groupClade() function assigns the branches and nodes under different clades into different groups. It accepts an internal node or a vector of internal nodes to cluster clade/clades.

Similarly, the groupOTU() function assigns branches and nodes to different groups based on user-specified groups of operational taxonomic units (OTUs) that are not necessarily within a clade but can be monophyletic (clade), polyphyletic or paraphyletic. It accepts a vector of OTUs (taxa name) or a list of OTUs and will trace back from OTUs to their most recent common ancestor (MRCA) and cluster them together as demonstrated in Figure Figure 8.5.

A phylogenetic tree can be annotated by mapping different line types, sizes, colors, or shapes of the branches or nodes that have been assigned to different groups.

data(iris)
rn <- paste0(iris[,5], "_", 1:150)
rownames(iris) <- rn
d_iris <- dist(iris[,-5], method="man")

tree_iris <- ape::bionj(d_iris)
grp <- list(setosa     = rn[1:50],
            versicolor = rn[51:100],
            virginica  = rn[101:150])

p_iris <- ggtree(tree_iris, layout = 'circular', branch.length='none')
groupOTU(p_iris, grp, 'Species') + aes(color=Species) +
  theme(legend.position="right")
Figure 8.5: Grouping OTUs. OTU clustering based on their relationships. Selected OTUs and their ancestors up to the MRCA will be clustered together.

We can group taxa at the tree level. The following code will produce an identical figure of Figure Figure 8.5 (see more details described in session 2.2.3).

tree_iris <- groupOTU(tree_iris, grp, "Species")
ggtree(tree_iris, aes(color=Species), layout = 'circular', 
        branch.length = 'none') + 
  theme(legend.position="right")

8.5 Exploring Tree Structure

To facilitate exploring the tree structure, ggtree supports rotating selected clade by 180 degrees using the rotate() function (Figure Figure 8.6 (b)). Position of immediate descendant clades of the internal node can be exchanged via flip() function (Figure Figure 8.6 (c)).

p1 <- p + geom_point2(aes(subset=node==16), color='darkgreen', size=5)
p2 <- rotate(p1, 16)
flip(p2, 17, 21)
(a) Original tree with selected clade
(b) Rotate selected clade by 180 degrees
(c) Flip immediate descendant clades
Figure 8.6: Exploring tree structure. A clade (indicated by a dark green circle) in a tree (A) can be rotated by 180° (B) and the positions of its immediate descendant clades (colored by blue and red) can be exchanged (C).

Most of the tree manipulation functions are working on clades, while ggtree also provides functions to manipulate a tree, including open_tree() to transform a tree in either rectangular or circular layout to the fan layout, and rotate_tree() function to rotate a tree for specific angle in both circular or fan layouts, as demonstrated in Figures Figure 8.7 and Figure 8.8.

p3 <- open_tree(p, 180) + geom_tiplab()
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
print(p3)

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Figure 8.7: Transforming a tree to fan layout. A tree can be transformed to a fan layout by open_tree with a specific angle.
rotate_tree(p3, 180)
Coordinate system already present.
ℹ Adding new coordinate system, which will replace the existing one.

Coordinate system already present.
ℹ Adding new coordinate system, which will replace the existing one.
Figure 8.8: Rotating tree. A circular/fan layout tree can be rotated by any specific angle.

The following example rotates four selected clades (Figure Figure 8.9). It is easy to traverse all the internal nodes and rotate them one-by-one.

set.seed(2016-05-29)
x <- rtree(50)
p <- ggtree(x) + geom_tiplab()

## nn <- unique(reorder(x, 'postorder')$edge[,1]) 
## to traverse all the internal nodes

nn <- sample(unique(reorder(x, 'postorder')$edge[,1]), 4)

pp <- lapply(nn, function(n) {
    p <- rotate(p, n)
    p + geom_point2(aes(subset=(node == n)), color='red', size=3)
})

pp[[1]]
pp[[2]]
pp[[3]]
pp[[4]]
(a) Rotation 1
(b) Rotation 2
(c) Rotation 3
(d) Rotation 4
Figure 8.9: Rotate selected clades. Four clades were randomly selected to rotate (indicated by the red symbol).

Figure Figure 8.10 demonstrates the usage of open_tree() with different open angles.

set.seed(123)
tr <- rtree(50)
p <- ggtree(tr, layout='circular') 
angles <- seq(0, 270, length.out=6)

pp <- lapply(angles, function(angle) {
  open_tree(p, angle=angle) + ggtitle(paste("open angle:", angle))
})

pp[[1]]
pp[[2]]
pp[[3]]
pp[[4]]
pp[[5]]
pp[[6]]
(a)
(b)
(c)
(d)
(e)
(f)
Figure 8.10: Open tree with different angles.

Figure Figure 8.11 illustrates a rotating tree with different angles.

(a)
(b)
(c)
(d)
(e)
(f)
Figure 8.11: Rotate tree with different angles.

Interactive tree manipulation is also possible via the identify() method (see details described in Chapter 12).

8.6 Summary

A good visualization tool can not only help users to present the data, but it should also be able to help users to explore the data. Data exploration can allow users to better understand the data and also help discover hidden patterns. The ggtree provides a set of functions to allow visually manipulating phylogenetic trees and exploring tree structures with associated data. Exploring data in the evolutionary context may help discover new systematic patterns and generate new hypotheses.

Yu, G., Smith, D. K., Zhu, H., Guan, Y., & Lam, T. T.-Y. (2017). Ggtree: An r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution, 8(1), 28–36. https://doi.org/10.1111/2041-210X.12628