12 ggtree utilities

12.1 Facet utilities

12.1.1 facet_widths

library(ggplot2)
library(ggstance)
library(ggtree)
library(reshape2)

set.seed(123)
tree <- rtree(30)

p <- ggtree(tree, branch.length = "none") + 
    geom_tiplab() + theme(legend.position='none')

a <- runif(30, 0,1)
b <- 1 - a
df <- data.frame(tree$tip.label, a, b)
df <- melt(df, id = "tree.tip.label")

p2 <- facet_plot(p + xlim_tree(8), panel = 'bar', data = df, geom = geom_barh, 
                 mapping = aes(x = value, fill = as.factor(variable)), 
                 width = 0.8, stat='identity') + xlim_tree(9)

facet_widths(p2, widths = c(1, 2))

It also supports using name vector to set the widths of specific panels. The following code will display identical figure to Figure 12.1A.

facet_widths(p2, c(Tree = .5))

The facet_widths function also work with other ggplot object as demonstrated in Figure 12.1B.

p <- ggplot(iris, aes(Sepal.Width, Petal.Length)) + 
  geom_point() + facet_grid(.~Species)
facet_widths(p, c(setosa = .5))
Adjust relative widths of ggplot facets. The facet_widths function works with ggtree (A) as well as ggplot (B).

Figure 12.1: Adjust relative widths of ggplot facets. The facet_widths function works with ggtree (A) as well as ggplot (B).

12.1.2 facet_labeller

The facet_labeller function was designed to re-label selected panels, and it currently only works with ggtree object (i.e. facet_plot output).

facet_labeller(p2, c(Tree = "phylogeny", bar = "HELLO"))

If you want to combine facet_widths with facet_labeller, you need to call facet_labeller to re-label the panels before using facet_widths to set the relative widths of each panels. Otherwise it wont work since the output of facet_widths is re-drawn from grid object.

facet_labeller(p2, c(Tree = "phylogeny")) %>% facet_widths(c(Tree = .4))
Rename facet labels. Rename multiple labels simultaneously (A) or only for specific one (B) are all supported. facet_labeller can combine with facet_widths to rename facet label and then adjust relative widths (B).

Figure 12.2: Rename facet labels. Rename multiple labels simultaneously (A) or only for specific one (B) are all supported. facet_labeller can combine with facet_widths to rename facet label and then adjust relative widths (B).

12.2 Geometric layers

Subsetting is not supported in layers defined in ggplot2, while it is quite useful in phylogenetic annotation since it allows us to annotate at specific node(s) (e.g. only label bootstrap values that larger than 75).

In ggtree, we provides modified version of layers defined in ggplot2 to support aesthetic mapping of subset, including:

  • geom_segment2
  • geom_point2
  • geom_text2
  • geom_label2
library(ggplot2)
library(ggtree)
data(mpg)
p <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
   geom_point(mapping = aes(color = class)) + 
   geom_text2(aes(label=manufacturer, 
                  subset = hwy > 40 | displ > 6.5), 
                  nudge_y = 1) +
   coord_cartesian(clip = "off") +
   theme_light() +
   theme(legend.position = c(.85, .75))          

p2 <- ggtree(rtree(10)) + 
    geom_label2(aes(subset = node <5, label = label))
plot_grid(p, p2, ncol=2, labels=c("A", "B"))
Geometric layers that supports subsetting. Thes layers works with ggplot2 (A) and ggtree (B).

Figure 12.3: Geometric layers that supports subsetting. Thes layers works with ggplot2 (A) and ggtree (B).

12.3 Layout utilities

In session 4.2.2, we introduce several layouts that supported by ggtree. The ggtree package also provide several layout functions that can transform from one to another. Note that not all layouts are supported (see 12.1).

Table 12.1: Layout layers.
Layout Description
layout_circular transform rectangular layout to circular layout
layout_dendrogram transform rectangular layout to dendrogram layout
layout_fan transform rectangular/circular layout to fan layout
layout_rectangular transform circular/fan layout to rectangular layout
layout_inward_circular transform rectangular/circular layout to inward_circular layout
set.seed(2019)
x <- rtree(20)
p <- ggtree(x)
p + layout_dendrogram()
ggtree(x, layout = "circular") + layout_rectangular()
p + layout_circular()
p + layout_fan(angle=90)
p + layout_inward_circular(xlim=4) + geom_tiplab(hjust=1)
Layout layers for transforming among different layouts. Default rectangular layout (A); transform rectangular to dendrogram layout (B); transform circular to rectangular layout (C); transform rectangular to circular layout (D); transform rectangular to fan layout (E); transform rectangular to inward circular layout (F).

Figure 12.4: Layout layers for transforming among different layouts. Default rectangular layout (A); transform rectangular to dendrogram layout (B); transform circular to rectangular layout (C); transform rectangular to circular layout (D); transform rectangular to fan layout (E); transform rectangular to inward circular layout (F).

12.4 Scale utilities

The scale_x_range() documented in session 5.2.4.

12.4.1 Expand x limit for specific panel

Sometimes we need to set xlim for specific panel (e.g. allocate more space for long tip labels at Tree panel). However, the ggplot2::xlim() function applies to all the panels. ggtree provides xlim_expand() to adjust xlim for user specific panel. It accepts two parameters, xlim and panel, and can adjust all individual panels as demonstrated in Figure 12.5A. If you only want to adjust xlim of the Tree panel, you can use xlim_tree() as a shortcut.

set.seed(2019-05-02)
x <- rtree(30)
p <- ggtree(x) + geom_tiplab()
d <- data.frame(label = x$tip.label, 
                value = rnorm(30))
p2 <- facet_plot(p, panel = "Dot", data = d, 
            geom = geom_point, mapping = aes(x = value))
p2 + xlim_tree(6) + xlim_expand(c(-10, 10), 'Dot')

The xlim_expand() function also works with ggplot2::facet_grid(). As demonstrating in Figure 12.5B, only the xlim of virginica panel was adjusted by xlim_expand().

g <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
    geom_point() + facet_grid(. ~ Species, scales = "free_x") 
g + xlim_expand(c(0, 15), 'virginica')
Setting xlim for user specific panel. xlim for ggtree::facet_plot (A, Tree and Dot panels), and ggplot2::facet_grid (B, virginica panel).

Figure 12.5: Setting xlim for user specific panel. xlim for ggtree::facet_plot (A, Tree and Dot panels), and ggplot2::facet_grid (B, virginica panel).

12.4.2 Expand plot limit by ratio of plot range

The ggplot2 package cannot automatically adjust plot limits and it is very common that long text was truncated. Users need to adjust x (y) limits manually via the xlim() (ylim()) command (see also FAQ: Tip label truncated).

The xlim() (ylim()) is a good solution to this issue. However, we can put the thing more simple, by expanding the plot panel by a ratio of the axis range without knowing what the exact value is.

We provide hexpand() function to expand x limit by specifying a fraction of the x range and it works for both direction (direction=1 for right hand side and direction=-1 for left hand side) (Figure 12.6). Another version vexpand() works with similar behavior for y axis and the ggexpand() function works for both x and y axes (Figure 11.2).

x$tip.label <- paste0('to make the label longer_', x$tip.label)
p1 <- ggtree(x) + geom_tiplab() + hexpand(.3)
p2 <- ggplot(iris, aes(Sepal.Width, Petal.Width)) + 
    geom_point() + 
    hexpand(.2, direction = -1) +
    vexpand(.2)
plot_grid(p1, p2, labels=c("A", "B"), rel_widths=c(.6, .4))
Expanding plot limits by a fraction of x or y range. expand x limit at right hand side by default (A). expand x limit for left hand side when direction = -1 and expand y limit at upper side (B).

Figure 12.6: Expanding plot limits by a fraction of x or y range. expand x limit at right hand side by default (A). expand x limit for left hand side when direction = -1 and expand y limit at upper side (B).

12.5 Tree data utilities

12.5.1 Filter tree data

The ggtree package defined several several geom layers that supports subsetting tree data. However, many other geom layers that didn’t provide this feature, are defined in ggplot2 and its extensions. To allow filtering tree data with these layers, ggtree provides an accompany function, td_filter() that return a function that work similar to dplyr::filter() and can be passed to the data parameter in geom layers to filter ggtree plot data as demonstrated in Figure 12.7.

library(tidytree)

set.seed(1997)
tree <- rtree(50)
p <- ggtree(tree) 
selected_nodes <- offspring(p, 67)$node
p + geom_text(aes(label=label), 
            data=td_filter(isTip & 
                        node %in% selected_nodes), 
            hjust=0)
Filtering ggtree plot data in geom layers.

Figure 12.7: Filtering ggtree plot data in geom layers.

12.5.2 Flatten list-column tree data

The ggtree plot data is a tidy data frame that each row represents a unique node. If multiple values are associated with a node, the data should be stored as nested data (i.e. in a list-column).

set.seed(1997)
tr <- rtree(5)
d <- data.frame(id=rep(tr$tip.label,2), 
                value=abs(rnorm(10, 6, 2)), 
                group=c(rep("A", 5),rep("B",5)))

require(tidyr)
d2  <- nest(d, value =value, group=group)
## d2 is a nested data
d2
## # A tibble: 5 x 3
##   id    value            group           
##   <chr> <list>           <list>          
## 1 t2    <tibble [2 × 1]> <tibble [2 × 1]>
## 2 t1    <tibble [2 × 1]> <tibble [2 × 1]>
## 3 t5    <tibble [2 × 1]> <tibble [2 × 1]>
## 4 t4    <tibble [2 × 1]> <tibble [2 × 1]>
## 5 t3    <tibble [2 × 1]> <tibble [2 × 1]>

Neste data is supported by the operator, %<+%, and can be mapped to the tree structure. If a geom layer can’t directly supports visualizing nested data, we need to flatten the data before applying the geom layer to display it. The ggtree package provides a function, td_unnest(), which return a function that works similar to tidyr::unnest() and can be used to flatten ggtree plot data as demonstrated in Figure 12.8A.

All tree data utilities provide a .f parameter to pass a function to pre-operate the data. This create the possibility to combine different tree data utilities as demonstrated in Figure 12.8B.

p <- ggtree(tr) %<+% d2
p2 <- p + 
    geom_point(aes(x, y, size= value, colour=group), 
            data = td_unnest(c(value, group)), alpha=.4) +
    scale_size(range=c(3,10), limits=c(3, 10))

p3 <- p + 
    geom_point(aes(x, y, size= value, colour=group), 
            data = td_unnest(c(value, group), 
                        .f = td_filter(isTip & node==4)), 
            alpha=.4) +
    scale_size(range=c(3,10), limits=c(3, 10))

cowplot::plot_grid(p2, p3, labels=LETTERS[1:2])            
Flattening ggtree plot data. (A) list-columns can be flattened by td_unnest(). (B) Different tree data utilites can be combined to work together (e.g. filter data by td_filter() and then flatten it by td_unnest().

Figure 12.8: Flattening ggtree plot data. (A) list-columns can be flattened by td_unnest(). (B) Different tree data utilites can be combined to work together (e.g. filter data by td_filter() and then flatten it by td_unnest().

12.6 Tree utilities

12.6.1 Extract tip order

To create composite plots, users need to re-order their data manually before they creating tree associated graph. The order of their data should be consistent with tip order presented in ggtree plot. For this purpose, we provide the get_taxa_name() function to extract an ordered vector of tips based on the tree structure plotted by ggtree.

set.seed(123)
tree <- rtree(10)
p <- ggtree(tree) + geom_tiplab() + 
    geom_hilight(node = 12, extendto = 2.5)
print(p)
An example tree for demonstraing get_taxa_name() function.

Figure 12.9: An example tree for demonstraing get_taxa_name() function.

The get_taxa_name() function will return a vector of ordered tip labels according to the tree structure displayed on Figure 12.9.

##  [1] "t9"  "t8"  "t3"  "t2"  "t7"  "t10" "t1"  "t5"  "t6" 
## [10] "t4"

If user specific a node, the get_taxa_name() will extract order tips of selected clade (i.e. highlighted region on the Figure 12.9).

get_taxa_name(p, node = 12)
## [1] "t5" "t6" "t4"

12.6.2 Padding taxa labels

The label_pad() function adds padding characters (default is ·) to taxa labels.

set.seed(2015-12-21)
tree <- rtree(5)
tree$tip.label[2] <- "long string for test"

d <- data.frame(label = tree$tip.label, 
                newlabel = label_pad(tree$tip.label),
                newlabel2 = label_pad(tree$tip.label, pad = " "))
print(d)
##                  label             newlabel
## 1                   t1 ··················t1
## 2 long string for test long string for test
## 3                   t2 ··················t2
## 4                   t4 ··················t4
## 5                   t3 ··················t3
##              newlabel2
## 1                   t1
## 2 long string for test
## 3                   t2
## 4                   t4
## 5                   t3

This feature is useful if we want to align tip labels to the end as demonstrated in Figure 12.10. Note that in this case, monospace font should be used to ensure the lengths of the labels displayed in the plot are the same.

p <- ggtree(tree) %<+% d + xlim(NA, 3)
p1 <- p + geom_tiplab(aes(label=newlabel), 
                    align=TRUE, family='mono',
                    linetype = "dotted", linesize = .7) 
p2 <- p + geom_tiplab(aes(label=newlabel2), 
                    align=TRUE, family='mono',
                    linetype = NULL, offset=-.5) + xlim(NA, 2)
cowplot::plot_grid(p1, p2, ncol=2, labels = c("A", "B"))                            
Align tip label to the end. With dotted line (A) and without dotted line (B).

Figure 12.10: Align tip label to the end. With dotted line (A) and without dotted line (B).

12.7 Interactive ggtree annotation

The ggtree package supports interactive tree annotation or manipulation by implementing an identify() method. Users can click on a node to highlight a clade, to label or rotate it etc. Users can also use the plotly package to convert ggtree to plotly object to quickly create interactive phylogenetic tree.

Interactive phylogenetic tree using identify() method. Highlighting, labelling and rotating clades are all supported.

Figure 12.11: Interactive phylogenetic tree using identify() method. Highlighting, labelling and rotating clades are all supported.

Video of using identify() to interactively manipulate phylogenetic tree can be found on Youtube and Youku: