2 Manipulating Tree with Data

2.1 Manipulating tree data using tidy interface

All the tree data parsed/merged by treeio (Wang et al. 2020) can be converted to tidy data frame using the tidytree package. The tidytree package provides tidy interfaces to manipulate tree with associated data. For instances, external data can be linked to phylogeny or evolutionary data obtained from different sources can be merged using tidyverse verbs. After the tree data was manipulated, it can be converted back to treedata object and exported to a single tree file, further analyzed in R or visualized using ggtree (Yu et al. 2017).

2.1.1 The phylo object

The phylo class defined in the ape package (Paradis, Claude, and Strimmer 2004) is fundamental for phylogenetic analysis in R. Most of the R packages in this field rely extensively on the phylo object. The tidytree package provides as_tibble method to convert the phylo object to tidy data frame, a tbl_tree object.

library(ape)
set.seed(2017)
tree <- rtree(4)
tree
## 
## Phylogenetic tree with 4 tips and 3 internal nodes.
## 
## Tip labels:
##   t4, t1, t3, t2
## 
## Rooted; includes branch lengths.
x <- as_tibble(tree)
x
## # A tibble: 7 x 4
##   parent  node branch.length label
##    <int> <int>         <dbl> <chr>
## 1      5     1       0.435   t4   
## 2      7     2       0.674   t1   
## 3      7     3       0.00202 t3   
## 4      6     4       0.0251  t2   
## 5      5     5      NA       <NA> 
## 6      5     6       0.472   <NA> 
## 7      6     7       0.274   <NA>

The tbl_tree object can be converted back to a phylo object using the as.phylo() method.

## 
## Phylogenetic tree with 4 tips and 3 internal nodes.
## 
## Tip labels:
##   t4, t1, t3, t2
## 
## Rooted; includes branch lengths.

Using tbl_tree object makes tree and data manipulation more effective and easier (see also example in FAQ). For example, we can link evolutionary trait to phylogeny using the dplyr verbs full_join:

d <- tibble(label = paste0('t', 1:4),
            trait = rnorm(4))

y <- full_join(x, d, by = 'label')
y
## # A tibble: 7 x 5
##   parent  node branch.length label  trait
##    <int> <int>         <dbl> <chr>  <dbl>
## 1      5     1       0.435   t4     0.943
## 2      7     2       0.674   t1    -0.171
## 3      7     3       0.00202 t3     0.570
## 4      6     4       0.0251  t2    -0.283
## 5      5     5      NA       <NA>  NA    
## 6      5     6       0.472   <NA>  NA    
## 7      6     7       0.274   <NA>  NA

2.1.2 The treedata object

The tidytree package defines treedata class to store phylogenetic tree with associated data. After mapping external data to the tree structure, the tbl_tree object can be converted to a treedata object.

as.treedata(y)
## 'treedata' S4 object'.
## 
## ...@ phylo: 
## Phylogenetic tree with 4 tips and 3 internal nodes.
## 
## Tip labels:
##   t4, t1, t3, t2
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'trait'.

The treedata class is also used in the treeio package (Wang et al. 2020) to store evolutionary evidences inferred by commonly used software (BEAST, EPA, HYPHY, MrBayes, PAML, PHYLDOG, pplacer, r8s, RAxML and RevBayes, etc.) (see details in Chapter 1).

The tidytree package also provides the as_tibble() method to convert a treedata object to a tidy data frame. The phylogenetic tree structure and the evolutionary inferences were stored in the tbl_tree object, making it consistent and easier for manipulating evolutionary statistics inferred by different software as well as linking external data to the same tree structure.

y %>% as.treedata %>% as_tibble
## # A tibble: 7 x 5
##   parent  node branch.length label  trait
##    <int> <int>         <dbl> <chr>  <dbl>
## 1      5     1       0.435   t4     0.943
## 2      7     2       0.674   t1    -0.171
## 3      7     3       0.00202 t3     0.570
## 4      6     4       0.0251  t2    -0.283
## 5      5     5      NA       <NA>  NA    
## 6      5     6       0.472   <NA>  NA    
## 7      6     7       0.274   <NA>  NA

2.1.3 Access related nodes

The dplyr verbs can be applied to tbl_tree directly to manipulate tree data. In addition, tidytree provides several verbs to filter related nodes, including child, parent, offspring, ancestor, sibling and MRCA.

These verbs accept a tbl_tree and a selected node which can be node number or label.

child(y, 5)
## # A tibble: 2 x 5
##   parent  node branch.length label  trait
##    <int> <int>         <dbl> <chr>  <dbl>
## 1      5     1         0.435 t4     0.943
## 2      5     6         0.472 <NA>  NA
parent(y, 2)
## # A tibble: 1 x 5
##   parent  node branch.length label trait
##    <int> <int>         <dbl> <chr> <dbl>
## 1      6     7         0.274 <NA>     NA
offspring(y, 5)
## # A tibble: 6 x 5
##   parent  node branch.length label  trait
##    <int> <int>         <dbl> <chr>  <dbl>
## 1      5     1       0.435   t4     0.943
## 2      7     2       0.674   t1    -0.171
## 3      7     3       0.00202 t3     0.570
## 4      6     4       0.0251  t2    -0.283
## 5      5     6       0.472   <NA>  NA    
## 6      6     7       0.274   <NA>  NA
ancestor(y, 2)
## # A tibble: 3 x 5
##   parent  node branch.length label trait
##    <int> <int>         <dbl> <chr> <dbl>
## 1      5     5        NA     <NA>     NA
## 2      5     6         0.472 <NA>     NA
## 3      6     7         0.274 <NA>     NA
sibling(y, 2)
## # A tibble: 1 x 5
##   parent  node branch.length label trait
##    <int> <int>         <dbl> <chr> <dbl>
## 1      7     3       0.00202 t3    0.570
MRCA(y, 2, 3)
## # A tibble: 1 x 5
##   parent  node branch.length label trait
##    <int> <int>         <dbl> <chr> <dbl>
## 1      6     7         0.274 <NA>     NA

All these methods also implemented in treeio for working with phylo and treedata objects. You can try accessing related nodes using the tree object. For instance, the following command will output child nodes of the selected internal node 5:

child(tree, 5)
## [1] 1 6

Beware that the methods work for tree objects output related node numbers, while the methods implemented for tbl_tree object output a tibble object that contains related information.

2.2 Data Integration

2.2.1 Combining tree data

The treeio package (Wang et al. 2020) serves as an infrastructure that enables various types of phylogenetic data inferred from common analysis programs to be imported and used in R. For instance dN/dS or ancestral sequences estimated by CODEML, and clade support values (posterior) inferred by BEAST/MrBayes. In addition, treeio supports linking external data to phylogeny. It brings these external phylogenetic data (either from software output or exteranl sources) to the R community and make it available for further analysis in R. Furthermore, treeio can combine multiple phylogenetic trees together into one with their node/branch-specific attribute data. Essentially, as a result, one such attribute (e.g., substitution rate) can be mapped to another attribute (e.g., dN/dS) of the same node/branch for comparison and further computations (Yu et al. 2017).

A previously published data set, seventy-six H3 hemagglutinin gene sequences of a lineage containing swine and human influenza A viruses (Liang et al. 2014), was here to demonstrate the utilities of comparing evolutionary statistics inferred by different software. The dataset was re-analyzed by BEAST for timescale estimation and CODEML for synonymous and non-synonymous substitution estimation. In this example, we first parsed the outputs from BEAST using the read.beast() function and from CODEML using the read.codeml() function into two treedata objects. Then these two objects containing separate sets of node/branch-specific data were merged via the merge_tree() function.

beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
rst_file <- system.file("examples/rst", package="ggtree")
mlc_file <- system.file("examples/mlc", package="ggtree")
beast_tree <- read.beast(beast_file)
codeml_tree <- read.codeml(rst_file, mlc_file)

merged_tree <- merge_tree(beast_tree, codeml_tree)
merged_tree
## 'treedata' S4 object that stored information of
##  '/home/ygc/R/library/ggtree/examples/MCC_FluA_H3.tree',
##  '/home/ygc/R/library/ggtree/examples/rst',
##  '/home/ygc/R/library/ggtree/examples/mlc'.
## 
## ...@ phylo: 
## Phylogenetic tree with 76 tips and 75 internal nodes.
## 
## Tip labels:
##   A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'height',   'height_0.95_HPD',  'height_median',
##  'height_range', 'length',   'length_0.95_HPD',
##  'length_median',    'length_range', 'posterior',    'rate',
##  'rate_0.95_HPD',    'rate_median',  'rate_range',   'subs',
##  'AA_subs',  't',    'N',    'S',    'dN_vs_dS', 'dN',   'dS',   'N_x_dN',
##  'S_x_dS'.

After merging the beast_tree and codeml_tree objects, all node/branch-specific data imported from BEAST and CODEML output files are all available in the merged_tree object. The tree object was converted to a tidy data frame using the tidytree package and visualized as hexbin scatterplot of dN/dS, dN and dS inferred by CODEML versus rate (substitution rate in unit of substitutions/site/year) inferred by BEAST on the same branches.

library(dplyr)
df <- merged_tree %>% 
  as_tibble() %>%
  select(dN_vs_dS, dN, dS, rate) %>%
  subset(dN_vs_dS >=0 & dN_vs_dS <= 1.5) %>%
  tidyr::gather(type, value, dN_vs_dS:dS)
df$type[df$type == 'dN_vs_dS'] <- 'dN/dS'
df$type <- factor(df$type, levels=c("dN/dS", "dN", "dS"))
ggplot(df, aes(rate, value)) + geom_hex() + 
  facet_wrap(~type, scale='free_y') 
Correlation of dN/dS, dN and dS versus substitution rate. After merging the BEAST and CodeML outputs, the branch-specific estimates (substitution rate, dN/dS , dN and dS) from the two analysis programs are compared on the same branch basis. The associations of dN/dS, dN and dS vs. rate are visualized in hexbin scatter plots.

Figure 2.1: Correlation of dN/dS, dN and dS versus substitution rate. After merging the BEAST and CodeML outputs, the branch-specific estimates (substitution rate, dN/dS , dN and dS) from the two analysis programs are compared on the same branch basis. The associations of dN/dS, dN and dS vs. rate are visualized in hexbin scatter plots.

The output is illustrated in Fig. 2.1. We can then test the association of these node/branch-specific data using Pearson correlation, which in this case showed that dN and dS, but not dN/dS are significantly (p-values) associated with rate.

Using the merge_tree() function, we are able to compare analysis results using identical model from different software packages or different models using different or identical software. It also allows users to integrate different analysis finding from different software packages. Merging tree data is not restricted to software findings, associating external data to analysis findings is also granted. The merge_tree() function is chainable and allows several tree objects to be merged into one.

phylo <- as.phylo(beast_tree)
N <- Nnode2(phylo)
d <- tibble(node = 1:N, fake_trait = rnorm(N), another_trait = runif(N))
fake_tree <- treedata(phylo = phylo, data = d)
triple_tree <- merge_tree(merged_tree, fake_tree)
triple_tree
## 'treedata' S4 object that stored information of
##  '/home/ygc/R/library/ggtree/examples/MCC_FluA_H3.tree',
##  '/home/ygc/R/library/ggtree/examples/rst',
##  '/home/ygc/R/library/ggtree/examples/mlc'.
## 
## ...@ phylo: 
## Phylogenetic tree with 76 tips and 75 internal nodes.
## 
## Tip labels:
##   A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'height',   'height_0.95_HPD',  'height_median',
##  'height_range', 'length',   'length_0.95_HPD',
##  'length_median',    'length_range', 'posterior',    'rate',
##  'rate_0.95_HPD',    'rate_median',  'rate_range',   'subs',
##  'AA_subs',  't',    'N',    'S',    'dN_vs_dS', 'dN',   'dS',   'N_x_dN',
##  'S_x_dS',   'fake_trait',   'another_trait'.

The triple_tree object showed above contains analysis results obtained from BEAST and CODEML, and evolutionary trait from external sources. All these information can be used to annotate the tree using ggtree (Yu et al. 2017).

2.2.3 Grouping taxa

The groupOTU() and groupClade() methods are designed for adding taxa grouping information to the input tree object. The methods were implemented in tidytree, treeio and ggtree respectively to support adding grouping information at the tbl_tree, phylo and treedata, and ggtree levels. These grouping information can be used directly in tree visualization (e.g. coloring tree based on grouping) with ggtree (Figure 6.5).

2.2.3.1 groupClade

The groupClade() method accepts an internal node or a vector of internal nodes to add grouping information of selected clade/clades.

nwk <- '(((((((A:4,B:4):6,C:5):8,D:6):3,E:21):10,((F:4,G:12):14,H:8):13):13,((I:5,J:2):30,(K:11,L:11):2):17):4,M:56);'
tree <- read.tree(text=nwk)

groupClade(as_tibble(tree), c(17, 21))
## # A tibble: 25 x 5
##    parent  node branch.length label group
##     <int> <int>         <dbl> <chr> <fct>
##  1     20     1             4 A     1    
##  2     20     2             4 B     1    
##  3     19     3             5 C     1    
##  4     18     4             6 D     1    
##  5     17     5            21 E     1    
##  6     22     6             4 F     2    
##  7     22     7            12 G     2    
##  8     21     8             8 H     2    
##  9     24     9             5 I     0    
## 10     24    10             2 J     0    
## # … with 15 more rows

2.2.3.2 groupOTU

set.seed(2017)
tr <- rtree(4)
x <- as_tibble(tr)
## the input nodes can be node ID or label
groupOTU(x, c('t1', 't4'), group_name = "fake_group")
## # A tibble: 7 x 5
##   parent  node branch.length label fake_group
##    <int> <int>         <dbl> <chr> <fct>     
## 1      5     1       0.435   t4    1         
## 2      7     2       0.674   t1    1         
## 3      7     3       0.00202 t3    0         
## 4      6     4       0.0251  t2    0         
## 5      5     5      NA       <NA>  1         
## 6      5     6       0.472   <NA>  1         
## 7      6     7       0.274   <NA>  1

Both groupClade() and groupOTU() work with the tbl_tree, phylo and treedata, and ggtree objects. Here is an example of using groupOTU() with a phylo tree object.

groupOTU(tr, c('t2', 't4'), group_name = "fake_group") %>%
  as_tibble
## # A tibble: 7 x 5
##   parent  node branch.length label fake_group
##    <int> <int>         <dbl> <chr> <fct>     
## 1      5     1       0.435   t4    1         
## 2      7     2       0.674   t1    0         
## 3      7     3       0.00202 t3    0         
## 4      6     4       0.0251  t2    1         
## 5      5     5      NA       <NA>  1         
## 6      5     6       0.472   <NA>  1         
## 7      6     7       0.274   <NA>  0

Another example of working with the ggtree object can be found in session 6.5.

The groupOTU will trace back from input nodes to most recent common ancestor. In this example, nodes 1, 4, 5 and 6 are grouping together (4 (t2) -> 6 -> 5 and 1 (t4) -> 5).

Related OTUs are grouping together and they are not necessarily within a clade. They can be monophyletic (clade), polyphyletic or paraphyletic.

cls <- list(c1=c("A", "B", "C", "D", "E"),
            c2=c("F", "G", "H"),
            c3=c("L", "K", "I", "J"),
            c4="M")

as_tibble(tree) %>% groupOTU(cls)
## # A tibble: 25 x 5
##    parent  node branch.length label group
##     <int> <int>         <dbl> <chr> <fct>
##  1     20     1             4 A     c1   
##  2     20     2             4 B     c1   
##  3     19     3             5 C     c1   
##  4     18     4             6 D     c1   
##  5     17     5            21 E     c1   
##  6     22     6             4 F     c2   
##  7     22     7            12 G     c2   
##  8     21     8             8 H     c2   
##  9     24     9             5 I     c3   
## 10     24    10             2 J     c3   
## # … with 15 more rows

If there are conflicts when tracing back to most recent common ancestor, user can set overlap parameter to “origin” (the first one counts), “overwrite” (default, the last one counts) or “abandon” (un-selected for grouping)7.

2.3 Rerooting tree

to be finished.

2.4 Rescaling Tree Branches

Phylogenetic data can be merged for joint analysis (Figure 2.1). They can be displayed on the same tree structure as more complex annotation to help visually inspection of their evolutionary patterns. All the numerical data stored in treedata object can be used to re-scale tree branches. For example, CodeML infers dN/dS, dN and dS, all these statistics can be used as branch lengths. All these values can also be used to color the tree (session 4.3.4) and can be project to vertical dimension to create two-dimensional tree or phenogram (session 4.2.2 and Figure 4.5 and 4.11). Instead of modifying branch lengths in the tree object, user can directly specifying a variable as branch length in ggtree() as demonstrated in session 4.3.6.

p1 <- ggtree(merged_tree) + theme_tree2()
p2 <- ggtree(rescale_tree(merged_tree, 'dN')) + theme_tree2()
p3 <- ggtree(rescale_tree(merged_tree, 'rate')) + theme_tree2()

cowplot::plot_grid(p1, p2, p3, ncol=3, labels = LETTERS[1:3])
Re-scaling tree branches. The tree with branches scaled in time (year from the root) (A). The tree was re-scaled using dN as branch lengths (B). The tree was re-scaled using substitution rates (C).

Figure 2.2: Re-scaling tree branches. The tree with branches scaled in time (year from the root) (A). The tree was re-scaled using dN as branch lengths (B). The tree was re-scaled using substitution rates (C).

2.5 Subsetting Tree with Data

2.5.1 Removing tips in a phylogenetic tree

Sometimes we want to remove selected tips from a phylogenetic tree. This is due to several reasons, including low sequence quality, errors in sequence assembly, an alignment error in part of the sequence and an error in phylogenetic inference etc.

Let’s say that we want to remove three tips (colored by red) from the tree (Figure 2.3A), the drop.tip() method removes specified tips and update the tree (Figure 2.3B). All associated data will be maintained in the updated tree.

f <- system.file("extdata/NHX", "phyldog.nhx", package="treeio")
nhx <- read.nhx(f)
to_drop <- c("Physonect_sp_@2066767",
            "Lychnagalma_utricularia@2253871",
            "Kephyes_ovata@2606431")
p1 <- ggtree(nhx) + geom_tiplab(aes(color = label %in% to_drop)) +
  scale_color_manual(values=c("black", "red")) + xlim(0, 0.8)

nhx_reduced <- drop.tip(nhx, to_drop)
p2 <- ggtree(nhx_reduced) + geom_tiplab() + xlim(0, 0.8)  
plot_grid(p1, p2, ncol=2, labels = c("A", "B"))
Removing tips from tree. Original tree with three tips (colored by red) to remove (A). Updated tree that removed selected tips (B).

Figure 2.3: Removing tips from tree. Original tree with three tips (colored by red) to remove (A). Updated tree that removed selected tips (B).

2.5.2 Subsetting tree by tip label

Tree can be large and difficult to look at only the portions of interest. The tree_subset() function was created in the treeio package (Wang et al. 2020) to extract a subset of the tree portion while still maintaining the structure of the tree portion. The beast_tree in Figure 2.4A is slightly crowded. Obviously, we can make the figure taller to allow more space for the labels (similar to use “Expansion” slider in FigTree) or we can make the text smaller. However, these solutions are not always applicable when you have a lot of tips (e.g. hundreds or thousands of tips). In particular, when you are only interested in the portion of the tree around a particular tip, you certainly don’t want to explore a large tree to find centain species you are interested in.

Let’s say you are interested in tip A/Swine/HK/168/2012 from the tree (Figure 2.4A) and you want to look at the immediate relatives of this tip.

The tree_subset() function allows you to look at the portions of the tree that are of interest. By default, the tree_subset() function will internally call the groupOTU() to assign group specified tip from the rest of other tips (Figure 2.4B). Additionally, the branch lengths and related associated data are maintained after subsetting (Figure 2.4C). The root of the tree is always anchored at zero for the subset tree by default and all the distances are relative to this root. If you want all the distances are relative to the original root, you can specify the root position (by root.position parameter) to the root edge of the subset tree, which is the sum of branch lengths from the original root to the root of the subset tree (Figure 2.4D and E).

beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
beast_tree <- read.beast(beast_file)

p1 = ggtree(beast_tree) + 
  geom_tiplab() +  xlim(0, 40) + theme_tree2()

tree2 = tree_subset(beast_tree, "A/Swine/HK/168/2012", levels_back=4)  
p2 <- ggtree(tree2, aes(color=group)) +
  scale_color_manual(values = c("black", "red")) +
  geom_tiplab() +  xlim(0, 4) + theme_tree2() 

p3 <- ggtree(tree2, aes(color=group)) +
  geom_tiplab(hjust = -.1) + xlim(0, 5) + 
  geom_point(aes(fill = rate), shape = 21, size = 4) +
  scale_color_manual(values = c("black", "red"), guide = FALSE) +
  scale_fill_continuous(low = 'blue', high = 'red') +
  theme_tree2() + theme(legend.position = 'right')


p4 <- ggtree(tree2, aes(color=group), 
          root.position = as.phylo(tree2)$root.edge) +
  geom_tiplab() + xlim(18, 24) + 
  scale_color_manual(values = c("black", "red")) +
  theme_tree2()

p5 <- ggtree(tree2, aes(color=group), 
          root.position = as.phylo(tree2)$root.edge) +
  geom_rootedge() + geom_tiplab() + xlim(0, 40) + 
  scale_color_manual(values = c("black", "red")) +
  theme_tree2()

plot_grid(p2, p3, p4, p5, ncol=2, labels=LETTERS[2:5]) %>%
  plot_grid(p1, ., ncol=2, labels=c("A", ""), rel_widths=c(.5, 1))
Subsetting tree for specific tip. The original tree (A). The subset tree (B). Subset tree with data (C). Visualize the subset tree relative to original position, without root edge (D) and with root edge (E).

Figure 2.4: Subsetting tree for specific tip. The original tree (A). The subset tree (B). Subset tree with data (C). Visualize the subset tree relative to original position, without root edge (D) and with root edge (E).

2.5.3 Subsetting tree by internal node number

If you are interesting at certain clade, you can specify the input node as an internal node number. The tree_subset() function will take the clade as a whole and also trace back to particular levels to look at the immediate relatives of the clade (Figure 2.5A and B). We can use the tree_subset() function to zoom in selected portions and plot a whole tree with the portion of it, that is similar to the ape::zoom() function to explore very large tree (Figure 2.5C and D). Users can also use viewClade() function to restrict tree visualization at specific clade as demonstrated in session 6.1.

clade <- tree_subset(beast_tree, node=121, levels_back=0)
clade2 <- tree_subset(beast_tree, node=121, levels_back=2)
p1 <- ggtree(clade) + geom_tiplab() + xlim(0, 5)
p2 <- ggtree(clade2, aes(color=group)) + geom_tiplab() + 
  xlim(0, 8) + scale_color_manual(values=c("black", "red"))


library(ape)
library(tidytree)
library(treeio)

data(chiroptera)

nodes <- grep("Plecotus", chiroptera$tip.label)
chiroptera <- groupOTU(chiroptera, nodes)

clade <- MRCA(chiroptera, nodes)
x <- tree_subset(chiroptera, clade, levels_back = 0)

p3 <- ggtree(chiroptera, aes(colour = group)) + 
  scale_color_manual(values=c("black", "red")) +
  theme(legend.position = "none")
p4 <- ggtree(x) + geom_tiplab() + xlim(0, 5)
plot_grid(p1, p2, p3, p4, 
  ncol=2, labels=LETTERS[1:4])
Subsetting tree for specific clade. Extracting a clade (A). Extracting a clade and trace back to look at its immediate relatives (B). Viewing a very large tree (C) and a selected portion of it (D).

Figure 2.5: Subsetting tree for specific clade. Extracting a clade (A). Extracting a clade and trace back to look at its immediate relatives (B). Viewing a very large tree (C) and a selected portion of it (D).

2.6 Manipulating tree data for visualization

Tree visualization is supported by ggtree (Yu et al. 2017). Although ggtree implemented several methods for visual exploration of tree with data, you may want to do something that is not supported directly. In this case, you need to manipulate tree data with node coordination positions that used for visualization. This is quite easy with ggtree. User can use the fortify() method which internally call tidytree::as_tibble() to convert the tree to a tidy data frame and add columns of coordination positions (i.e. x, y, branch and angle) that are used to plot the tree. You can also access the data via ggtree(tree)$data.

Here is an example to plot two trees face to face that is similar to a ape::cophyloplot().

library(dplyr)
library(ggtree)

set.seed(1024)
x <- rtree(30)
y <- rtree(30)
p1 <- ggtree(x, layout='roundrect') + 
  geom_hilight(
         mapping=aes(subset = node %in% c(38, 48, 58, 36),
                     node = node,
                     fill = as.factor(node)
                     )
     ) +
    labs(fill = "clades for tree in left" )

p2 <- ggtree(y)

d1 <- p1$data
d2 <- p2$data

## reverse x-axis and 
## set offset to make the tree in the right hand side of the first tree
d2$x <- max(d2$x) - d2$x + max(d1$x) + 1

pp <- p1 + geom_tree(data=d2, layout='ellipse') +      
  ggnewscale::new_scale_fill() +
  geom_hilight(
         data = d2, 
         mapping = aes( 
            subset = node %in% c(38, 48, 58),
            node=node,
            fill=as.factor(node))
  ) +
  labs(fill = "clades for tree in right" ) 

dd <- bind_rows(d1, d2) %>% 
  filter(!is.na(label))

pp + geom_line(aes(x, y, group=label), data=dd, color='grey') +
    geom_tiplab(geom = 'shadowtext', bg.colour = alpha('firebrick', .5)) +
    geom_tiplab(data = d2, hjust=1, geom = 'shadowtext', bg.colour = alpha('firebrick', .5))
Plot two phylogenetic trees face to face. Plotting a tree using ggtree() (left hand side) and subsequently add another layer of tree by geom_tree() (right hand side). The relative positions of the plotted trees can be manual adjusted and adding layers to each of the tree (e.g. tip labels and highlighting clades) is independent.

Figure 2.6: Plot two phylogenetic trees face to face. Plotting a tree using ggtree() (left hand side) and subsequently add another layer of tree by geom_tree() (right hand side). The relative positions of the plotted trees can be manual adjusted and adding layers to each of the tree (e.g. tip labels and highlighting clades) is independent.

It is quite easy to plot multiple trees and connect taxa in one figure. For instance, plotting trees contructed from all internal gene segments of influenza virus and connecting equivalent strans across the trees (Venkatesh et al. 2018).

z <- rtree(30)
d2 <- fortify(y)
d3 <- fortify(z)
d2$x <- d2$x + max(d1$x) + 1
d3$x <- d3$x + max(d2$x) + 1

dd = bind_rows(d1, d2, d3) %>% 
  filter(!is.na(label))

p1 + geom_tree(data = d2) + geom_tree(data = d3) + geom_tiplab(data=d3) + 
  geom_line(aes(x, y, group=label, color=node < 15), data=dd, alpha=.3)
Plot multiple phylogenetic trees side by side. Plotting a tree using ggtree() and subsequently add multiple layers of trees by geom_tree().

Figure 2.7: Plot multiple phylogenetic trees side by side. Plotting a tree using ggtree() and subsequently add multiple layers of trees by geom_tree().

2.7 Summary

The treeio package allows us to import diverse phylogeny associated data into R. However, phylogenetic tree is stored in a way to facilitate computational processing which is not human fridenly and need expertise to manipulate and explore tree data. The tidytree package provides tidy interface for exploring tree data, while ggtree provides a set of utilitise to visualize and explore tree data using grammar of graphics. This full suit of packages make it easy for ordinary users to interact with tree data, and allow us to integrate phylogeny associated data from different sources (e.g. experimental result or analysis finding), which creates the possibility of comparative study.