12 ggtree Utilities

12.1 Facet Utilities

12.1.1 facet_widths

Adjusting relative widths of facet panels is a common requirement, especially for using geom_facet() to visualize a tree with associated data. However, this is not supported by the ggplot2 package. To address this issue, ggtree provides the facet_widths() function and it works with both ggtree and ggplot objects.

library(ggplot2)
library(ggtree)
library(reshape2)

set.seed(123)
tree <- rtree(30)

p <- ggtree(tree, branch.length = "none") + 
    geom_tiplab() + theme(legend.position='none')

a <- runif(30, 0,1)
b <- 1 - a
df <- data.frame(tree$tip.label, a, b)
df <- melt(df, id = "tree.tip.label")

p2 <- p + geom_facet(panel = 'bar', data = df, geom = geom_bar, 
                 mapping = aes(x = value, fill = as.factor(variable)), 
                 orientation = 'y', width = 0.8, stat='identity') + 
        xlim_tree(9)

facet_widths(p2, widths = c(1, 2))

It also supports using a name vector to set the widths of specific panels. The following code will display an identical figure to Figure 12.1A.

facet_widths(p2, c(Tree = .5))

The facet_widths() function also works with other ggplot objects as demonstrated in Figure 12.1B.

p <- ggplot(iris, aes(Sepal.Width, Petal.Length)) + 
  geom_point() + facet_grid(.~Species)
facet_widths(p, c(setosa = .5))
Adjust relative widths of ggplot facets. The facet_widths() function works with ggtree (A) as well as ggplot (B).

FIGURE 12.1: Adjust relative widths of ggplot facets. The facet_widths() function works with ggtree (A) as well as ggplot (B).

12.1.2 facet_labeller

The facet_labeller() function was designed to relabel selected panels (Figure 12.2), and it currently only works with ggtree objects (i.e., geom_facet() outputs). A more versatile version that works with both ggtree and ggplot objects is implemented in the ggfun package (i.e., the facet_set() function).

facet_labeller(p2, c(Tree = "phylogeny", bar = "HELLO"))

If you want to combine facet_widths() with facet_labeller(), you need to call facet_labeller() to relabel the panels before using facet_widths() to set the relative widths of each panel. Otherwise, it won’t work since the output of facet_widths() is redrawn from grid object.

facet_labeller(p2, c(Tree = "phylogeny")) %>% facet_widths(c(Tree = .4))
Rename facet labels. Rename multiple labels simultaneously (A) or only for a specific one (B) are all supported. facet_labeller() can combine with facet_widths() to rename facet label and then adjust relative widths (B).

FIGURE 12.2: Rename facet labels. Rename multiple labels simultaneously (A) or only for a specific one (B) are all supported. facet_labeller() can combine with facet_widths() to rename facet label and then adjust relative widths (B).

12.2 Geometric Layers

Subsetting is not supported in layers defined in ggplot2, while it is quite useful in phylogenetic annotation since it allows us to annotate at specific node(s) (e.g., only label bootstrap values that are larger than 75).

In ggtree, we provide several modified versions of layers defined in ggplot2 to support the subset aesthetic mapping, including:

These layers works with both ggtree and ggplot2 (Figure 12.3).

library(ggplot2)
library(ggtree)
data(mpg)
p <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
   geom_point(mapping = aes(color = class)) + 
   geom_text2(aes(label=manufacturer, 
                  subset = hwy > 40 | displ > 6.5), 
                  nudge_y = 1) +
   coord_cartesian(clip = "off") +
   theme_light() +
   theme(legend.position = c(.85, .75))          

p2 <- ggtree(rtree(10)) + 
    geom_label2(aes(subset = node <5, label = label))

plot_list(p, p2, ncol=2, tag_levels='A')
Geometric layers that support subsetting. These layers work with ggplot2 (A) and ggtree (B).

FIGURE 12.3: Geometric layers that support subsetting. These layers work with ggplot2 (A) and ggtree (B).

12.3 Layout Utilities

In session 4.2, we introduce several layouts supported by ggtree. The ggtree package also provides several layout functions that can transform from one to another. Note that not all layouts are supported (see Table 12.1 and Figure 12.4).

TABLE 12.1: Layout transformers.
Layout Description
layout_circular transform rectangular layout to circular layout
layout_dendrogram transform rectangular layout to dendrogram layout
layout_fan transform rectangular/circular layout to fan layout
layout_rectangular transform circular/fan layout to rectangular layout
layout_inward_circular transform rectangular/circular layout to inward_circular layout
set.seed(2019)
x <- rtree(20)
p <- ggtree(x)
p + layout_dendrogram()
ggtree(x, layout = "circular") + layout_rectangular()
p + layout_circular()
p + layout_fan(angle=90)
p + layout_inward_circular(xlim=4) + geom_tiplab(hjust=1)
Layout functions for transforming among different layouts. Default rectangular layout (A); transform rectangular to dendrogram layout (B); transform circular to rectangular layout (C); transform rectangular to circular layout (D); transform rectangular to fan layout (E); transform rectangular to inward circular layout (F).

FIGURE 12.4: Layout functions for transforming among different layouts. Default rectangular layout (A); transform rectangular to dendrogram layout (B); transform circular to rectangular layout (C); transform rectangular to circular layout (D); transform rectangular to fan layout (E); transform rectangular to inward circular layout (F).

12.4 Scale Utilities

The ggtree package provides several scale functions to manipulate the x-axis, including the scale_x_range() documented in session 5.2.4, xlim_tree(), xlim_expand(), ggexpand(), hexpand() and vexpand().

12.4.1 Expand x limit for a specific facet panel

Sometimes we need to set xlim for a specific facet panel (e.g., allocate more space for long tip labels at Tree panel). However, the ggplot2::xlim() function applies to all the panels. The ggtree provides xlim_expand() to adjust xlim for user-specific facet panel. It accepts two parameters, xlim, and panel, and can adjust all individual panels as demonstrated in Figure 12.5A. If you only want to adjust xlim of the Tree panel, you can use xlim_tree() as a shortcut.

set.seed(2019-05-02)
x <- rtree(30)
p <- ggtree(x) + geom_tiplab()
d <- data.frame(label = x$tip.label, 
                value = rnorm(30))
p2 <- p + geom_facet(panel = "Dot", data = d, 
            geom = geom_point, mapping = aes(x = value))
p2 + xlim_tree(6) + xlim_expand(c(-10, 10), 'Dot')

The xlim_expand() function also works with ggplot2::facet_grid(). As demonstrated in Figure 12.5B, only the xlim of virginica panel was adjusted by xlim_expand().

g <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
    geom_point() + facet_grid(. ~ Species, scales = "free_x") 
g + xlim_expand(c(0, 15), 'virginica')
Setting xlim for user-specific facet panels. Using xlim_tree() to set the Tree panel of the ggtree output (A) and xlim_expand() to set the Dot panel of the ggtree output (A) and the Virginica panel of the ggplot output (B).

FIGURE 12.5: Setting xlim for user-specific facet panels. Using xlim_tree() to set the Tree panel of the ggtree output (A) and xlim_expand() to set the Dot panel of the ggtree output (A) and the Virginica panel of the ggplot output (B).

12.4.2 Expand plot limit by the ratio of plot range

The ggplot2 package cannot automatically adjust plot limits and it is very common that long text was truncated. Users need to adjust x (y) limits manually via the xlim() (ylim()) command (see also FAQ: Tip label truncated).

The xlim() (ylim()) is a good solution to this issue. However, we can make it more simple, by expanding the plot panel by a ratio of the axis range without knowing what the exact value is.

We provide hexpand() function to expand x limit by specifying a fraction of the x range and it works for both directions (direction=1 for right-hand side and direction=-1 for left-hand side) (Figure 12.6). Another version of vexpand() works with similar behavior for y-axis and the ggexpand() function works for both x- and y-axis (Figure 11.2).

x$tip.label <- paste0('to make the label longer_', x$tip.label)
p1 <- ggtree(x) + geom_tiplab() + hexpand(.4)
p2 <- ggplot(iris, aes(Sepal.Width, Petal.Width)) + 
    geom_point() + 
    hexpand(.2, direction = -1) +
    vexpand(.2)

plot_list(p1, p2, tag_levels="A", widths=c(.6, .4))
Expanding plot limits by a fraction of the x or y range. Expand x limit at right-hand side by default (A), and expand x limit for left-hand side when direction = -1 and expand y limit at the upper side (B).

FIGURE 12.6: Expanding plot limits by a fraction of the x or y range. Expand x limit at right-hand side by default (A), and expand x limit for left-hand side when direction = -1 and expand y limit at the upper side (B).

12.5 Tree data utilities

12.5.1 Filter tree data

The ggtree package defined several geom layers that support subsetting tree data. However, many other geom layers that didn’t provide this feature, are defined in ggplot2 and its extensions. To allow filtering tree data with these layers, ggtree provides an accompanying function, td_filter() that returns a function that works similar to dplyr::filter() and can be passed to the data parameter in geom layers to filter ggtree plot data as demonstrated in Figure 12.7.

library(tidytree)

set.seed(1997)
tree <- rtree(50)
p <- ggtree(tree) 
selected_nodes <- offspring(p, 67)$node
p + geom_text(aes(label=label), 
            data=td_filter(isTip & 
                        node %in% selected_nodes), 
            hjust=0) +
    geom_nodepoint(aes(subset = node ==67), 
                    size=5, color='blue')
Filtering ggtree plot data in geom layers. Only selected tips (offspring of the node indicated by the blue circle point) were labeled.

FIGURE 12.7: Filtering ggtree plot data in geom layers. Only selected tips (offspring of the node indicated by the blue circle point) were labeled.

12.5.2 Flatten list-column tree data

The ggtree plot data is a tidy data frame where each row represents a unique node. If multiple values are associated with a node, the data can be stored as nested data (i.e., in a list-column).

set.seed(1997)
tr <- rtree(5)
d <- data.frame(id=rep(tr$tip.label,2), 
                value=abs(rnorm(10, 6, 2)), 
                group=c(rep("A", 5),rep("B",5)))

require(tidyr)
d2  <- nest(d, value =value, group=group)
## d2 is a nested data
d2
## # A tibble: 5 × 3
##   id    value            group           
##   <chr> <list>           <list>          
## 1 t2    <tibble [2 × 1]> <tibble [2 × 1]>
## 2 t1    <tibble [2 × 1]> <tibble [2 × 1]>
## 3 t5    <tibble [2 × 1]> <tibble [2 × 1]>
## 4 t4    <tibble [2 × 1]> <tibble [2 × 1]>
## 5 t3    <tibble [2 × 1]> <tibble [2 × 1]>

Nested data is supported by the operator, %<+%, and can be mapped to the tree structure. If a geom layer can’t directly support visualizing nested data, we need to flatten the data before applying the geom layer to display it. The ggtree package provides a function, td_unnest(), which returns a function that works similar to tidyr::unnest() and can be used to flatten ggtree plot data as demonstrated in Figure 12.8A.

All tree data utilities provide a .f parameter to pass a function to pre-operate the data. This creates the possibility to combine different tree data utilities as demonstrated in Figure 12.8B.

p <- ggtree(tr) %<+% d2
p2 <- p + 
    geom_point(aes(x, y, size= value, colour=group), 
            data = td_unnest(c(value, group)), alpha=.4) +
    scale_size(range=c(3,10), limits=c(3, 10))

p3 <- p + 
    geom_point(aes(x, y, size= value, colour=group), 
            data = td_unnest(c(value, group), 
                        .f = td_filter(isTip & node==4)), 
            alpha=.4) +
    scale_size(range=c(3,10), limits=c(3, 10))

plot_list(p2, p3, tag_levels = 'A')
Flattening ggtree plot data. List-columns can be flattened by td_unnest() and two circle points were displayed on each tip simultaneously (A). Different tree data utilities can be combined to work together, e.g., filter data by td_filter(), and then flatten it by td_unnest()) (B).

FIGURE 12.8: Flattening ggtree plot data. List-columns can be flattened by td_unnest() and two circle points were displayed on each tip simultaneously (A). Different tree data utilities can be combined to work together, e.g., filter data by td_filter(), and then flatten it by td_unnest()) (B).

12.6 Tree Utilities

12.6.1 Extract tip order

To create composite plots, users need to re-order their data manually before creating tree-associated graphs. The order of their data should be consistent with the tip order presented in the ggtree() plot. For this purpose, we provide the get_taxa_name() function to extract an ordered vector of tips based on the tree structure plotted by ggtree().

set.seed(123)
tree <- rtree(10)
p <- ggtree(tree) + geom_tiplab() + 
    geom_hilight(node = 12, extendto = 2.5)

x <- paste("Taxa order:", 
        paste0(get_taxa_name(p), collapse=', '))
p + labs(title=x)
An example tree for demonstrating get_taxa_name() function.

FIGURE 12.9: An example tree for demonstrating get_taxa_name() function.

The get_taxa_name() function will return a vector of ordered tip labels according to the tree structure displayed in Figure 12.9.

##  [1] "t9"  "t8"  "t3"  "t2"  "t7"  "t10" "t1"  "t5" 
##  [9] "t6"  "t4"

If users specify a node, the get_taxa_name() will extract the tip order of the selected clade (i.e., highlighted region in Figure 12.9).

get_taxa_name(p, node = 12)
## [1] "t5" "t6" "t4"

12.6.2 Padding taxa labels

The label_pad() function adds padding characters (default is ·) to taxa labels.

set.seed(2015-12-21)
tree <- rtree(5)
tree$tip.label[2] <- "long string for test"

d <- data.frame(label = tree$tip.label, 
                newlabel = label_pad(tree$tip.label),
                newlabel2 = label_pad(tree$tip.label, pad = " "))
print(d)
##                  label             newlabel
## 1                   t1 ··················t1
## 2 long string for test long string for test
## 3                   t2 ··················t2
## 4                   t4 ··················t4
## 5                   t3 ··················t3
##              newlabel2
## 1                   t1
## 2 long string for test
## 3                   t2
## 4                   t4
## 5                   t3

This feature is useful if we want to align tip labels to the end as demonstrated in Figure 12.10. Note that in this case, monospace font should be used to ensure the lengths of the labels displayed in the plot are the same.

p <- ggtree(tree) %<+% d + xlim(NA, 5)
p1 <- p + geom_tiplab(aes(label=newlabel), 
                    align=TRUE, family='mono',
                    linetype = "dotted", linesize = .7) 
p2 <- p + geom_tiplab(aes(label=newlabel2), 
                    align=TRUE, family='mono',
                    linetype = NULL, offset=-.5) + xlim(NA, 5)
plot_list(p1, p2, ncol=2, tag_levels = "A")                            
Align tip label to the end. With a dotted line (A) and without a dotted line (B).

FIGURE 12.10: Align tip label to the end. With a dotted line (A) and without a dotted line (B).

12.7 Interactive ggtree Annotation

The ggtree package supports interactive tree annotation or manipulation by implementing an identify() method. Users can click on a node to highlight a clade, to label or rotate it, etc. Users can also use the plotly package to convert a ggtree object to a plotly object to quickly create an interactive phylogenetic tree.

Interactive phylogenetic tree using identify() method. Highlighting, labelling and rotating clades are all supported.

FIGURE 12.11: Interactive phylogenetic tree using identify() method. Highlighting, labelling and rotating clades are all supported.

Video of using identify() to interactively manipulate a phylogenetic tree can be found on Youtube and Youku: