11 Other ggtree Extensions

The ggtree package is a general package for visualizing tree structures and associated data. If you have some special requirements that are not directly provided by ggtree, you may need to use one of the extension packages built on top of ggtree. For example, the RevGadgets package for visualizing the output of the RevBayes, the sitePath package for visualizing fixation events on phylogenetic pathways, and the enrichplot package for visualizing hierarchical structure of the enriched pathways.

rp <- BiocManager::repositories()
db <- utils::available.packages(repo=rp)
x <- tools::package_dependencies('ggtree', db=db, 
                                which = c("Depends", "Imports"), 
                                reverse=TRUE)
print(x)

## $ggtree
##  [1] "enrichplot"        "ggtreeExtra"      
##  [3] "LymphoSeq"         "miaViz"           
##  [5] "microbiomeMarker"  "MicrobiotaProcess"
##  [7] "philr"             "singleCellTK"     
##  [9] "sitePath"          "systemPipeTools"  
## [11] "tanggle"           "treekoR"

There are 12 packages in CRAN or Bioconductor that depend on or import ggtree and several packages on GitHub that extend ggtree. Here we briefly introduce some extension packages, including MicrobiotaProcess and tanggle.

11.1 Taxonomy Annotation Using MicrobiotaProcess

The MicrobiotaProcess package provides a LEfSe-like algorithm (Segata et al., 2011) to discover microbiome biomarkers by comparing taxon abundance between different classes. It provides several methods to visualize the analysis result. The ggdiffclade is developed based on ggtree (Yu et al., 2017). In addition to the diff_analysis() result, it also supports a data frame that contains a hierarchical relationship (e.g., taxonomy annotation or KEGG annotation) with another data frame that contains taxa and factor information and/or pvalue. The following example demonstrates how to use data frames (i.e., analysis results) to visualize the differential taxonomy tree. More details can be found on the vignette of the MicrobiotaProcess package.

library(MicrobiotaProcess)
library(ggplot2)
library(TDbook)

# load `df_difftax` and `df_difftax_info` from TDbook
taxa <- df_alltax_info
dt <- df_difftax

ggdiffclade(obj=taxa,
            nodedf=dt,
            factorName="DIAGNOSIS",
            layout="radial",
            skpointsize=0.6,
            cladetext=2,
            linewd=0.2,
            taxlevel=3,
            # This argument is to remove the branch of unknown taxonomy.
            reduce=TRUE) + 
     scale_fill_manual(values=c("#00AED7", "#009E73"))+
     guides(color = guide_legend(keywidth = 0.1, keyheight = 0.6,
                                 order = 3,ncol=1)) +
     theme(panel.background=element_rect(fill=NA),
           legend.position="right",
           plot.margin=margin(0,0,0,0),
           legend.spacing.y=unit(0.02, "cm"), 
           legend.title=element_text(size=7.5), 
           legend.text=element_text(size=5.5), 
           legend.box.spacing=unit(0.02,"cm")
        )

FIGURE 11.1: Visualize differential taxonomy clade.

The data frame of this example is from the analysis result of diff_analysis() using public datasets (Kostic et al., 2012). The colors represent the features enriched in the relevant class groups. The size of circle points represents the -log10(pvalue), i.e., a larger point indicates a greater significance. In Figure 11.1, we can find that Fusobacterium sequences were enriched in carcinomas, while Firmicutes, Bacteroides, and Clostridiales were greatly reduced in tumors. These results were consistent with the original article (Kostic et al., 2012). The species of Campylobacter has been proven to be associated with colorectal cancer (Amer et al., 2017; He et al., 2019; Wu et al., 2013). We can find in Figure 11.1 that Campylobacter was enriched in tumors, while its relative abundance is lower than Fusobacterium.

11.2 Visualizing Phylogenetic Network Using Tanggle

The tanggle package provides functions to display a split network. It extends the ggtree package (Yu et al., 2017) to allow the visualization of phylogenetic networks (Figure 11.2).

library(ggplot2)
library(ggtree)
library(tanggle)

file <- system.file("extdata/trees/woodmouse.nxs", package = "phangorn")
Nnet <- phangorn::read.nexus.networx(file)

ggsplitnet(Nnet) +
    geom_tiplab2(aes(color=label), hjust=-.1)+
    geom_nodepoint(color='firebrick', alpha=.4) +
    scale_color_manual(values=rainbow(15)) +   
    theme(legend.position="none") +
    ggexpand(.1) + ggexpand(.1, direction=-1)

FIGURE 11.2: Phylogenetic network.

11.3 Summary

The ggtree is designed to support the grammar of graphics, allowing users to quickly explore phylogenetic data through visualization. When users have special needs and ggtree does not provide them, it is highly recommended to develop extension packages to implement these missing functions. This is a good mechanism, and we also hope that ggtree users can become a ggtree community. In this way, more functions for special needs can be developed and shared among users. Everyone will benefit from it, and it’s exciting that this is happening.

10 ggtreeExtra for Presenting Data on a Circular Layout

12 ggtree Utilities