15 Visualization of functional enrichment result

The enrichplot package implements several visualization methods to help interpreting enrichment results. It supports visualizing enrichment results obtained from DOSE (Yu et al. 2015), clusterProfiler (Yu et al. 2012; Wu et al. 2021), ReactomePA (Yu and He 2016) and meshes (Yu 2018). Both over representation analysis (ORA) and gene set enrichment analysis (GSEA) are supported.

Note: Several visualization methods were first implemented in DOSE and rewrote from scratch using ggplot2. If you want to use the old methods, you can use the doseplot package.

15.1 Bar Plot

Bar plot is the most widely used method to visualize enriched terms. It depicts the enrichment scores (e.g. p values) and gene count or ratio as bar height and color (Figure 15.1A). Users can specify the number of terms (most significant) or selected terms (see also the FAQ) to display via the showCategory parameter.

library(DOSE)
data(geneList)
de <- names(geneList)[abs(geneList) > 2]

edo <- enrichDGN(de)

library(enrichplot)
barplot(edo, showCategory=20)

Other variables that derived using mutate can also be used as bar height or color as demonstrated in Figure 15.1B.

mutate(edo, qscore = -log(p.adjust, base=10)) %>% 
    barplot(x="qscore")

Figure 15.1: Bar plot of enriched terms.

15.2 Dot plot

Dot plot is similar to bar plot with the capability to encode another score as dot size.

edo2 <- gseDO(geneList)
dotplot(edo, showCategory=30) + ggtitle("dotplot for ORA")
dotplot(edo2, showCategory=30) + ggtitle("dotplot for GSEA")

Figure 15.2: Dot plot of enriched terms.

Note: The dotplot() function also works with compareCluster() output.

15.3 Gene-Concept Network

Both the barplot() and dotplot() only displayed most significant or selected enriched terms, while users may want to know which genes are involved in these significant terms. In order to consider the potentially biological complexities in which a gene may belong to multiple annotation categories and provide information of numeric changes if available, we developed the cnetplot() function to extract the complex association. The cnetplot() depicts the linkages of genes and biological concepts (e.g. GO terms or KEGG pathways) as a network. GSEA result is also supported with only core enriched genes displayed.

## convert gene ID to Symbol
edox <- setReadable(edo, 'org.Hs.eg.db', 'ENTREZID')
p1 <- cnetplot(edox, foldChange=geneList)
## categorySize can be scaled by 'pvalue' or 'geneNum'
p2 <- cnetplot(edox, categorySize="pvalue", foldChange=geneList)
p3 <- cnetplot(edox, foldChange=geneList, circular = TRUE, colorEdge = TRUE) 
cowplot::plot_grid(p1, p2, p3, ncol=3, labels=LETTERS[1:3], rel_widths=c(.8, .8, 1.2))

Figure 15.3: Network plot of enriched terms.

If you would like label subset of the nodes, you can use the node_label parameter, which supports 4 possible selections (i.e. “category”, “gene”, “all” and “none”), as demonstrated in Figure 15.4. The size of category and gene label can be specified via the cex_label_category and cex_label_gene parameters. The color of the categories and genes can be specified via the color_category and color_gene parameters.

p1 <- cnetplot(edox, node_label="category", 
        cex_label_category = 1.2) 
p2 <- cnetplot(edox, node_label="gene", 
        cex_label_gene = 0.8) 
p3 <- cnetplot(edox, node_label="all") 
p4 <- cnetplot(edox, node_label="none", 
        color_category='firebrick', 
        color_gene='steelblue') 
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])

Figure 15.4: Labelling nodes by selected subset. gene category (A), gene name (B), both gene category and gene name (C, default) and not to label at all (D).

The cnetplot function can be used as a general method to visualize data relationships in a network diagram. Users can use a named list as input as demonstrated in Figure 15.5.

set.seed(123)
x <- list(A = letters[1:10], B=letters[5:12], C=letters[sample(1:26, 15)])
p1 <- cnetplot(x)

set.seed(123)
d <- setNames(rnorm(26), letters)
p2 <- cnetplot(x, foldChange=d) + 
    scale_color_gradient2(name='associated data', low='darkgreen', high='firebrick')

cowplot::plot_grid(p1, p2, ncol=2, labels=LETTERS[1:2])

Figure 15.5: Using cnetplot to visualize data relationships. relationships as a network diagram (A) and with associated data to color nodes (B).

Note: The cnetplot() function also works with compareCluster() output.

15.4 Heatmap-like functional classification

The heatplot is similar to cnetplot, while displaying the relationships as a heatmap. The gene-concept network may become too complicated if user want to show a large number significant terms. The heatplot can simplify the result and more easy to identify expression patterns.

p1 <- heatplot(edox, showCategory=5)
p2 <- heatplot(edox, foldChange=geneList, showCategory=5)
cowplot::plot_grid(p1, p2, ncol=1, labels=LETTERS[1:2])

Figure 15.6: Heatmap plot of enriched terms. default (A), foldChange=geneList (B)

15.5 Tree plot

The treeplot() function performs hierarchical clustering of enriched terms. It relies on the pairwise similarities of the enriched terms calculated by the pairwise_termsim() function, which by default using Jaccard’s similarity index (JC). Users can also use semantic similarity values if it is supported (e.g., GO, DO and MeSH).

The default agglomeration method in treeplot() is ward.D and users can specify other methods via the hclust_method parameter (e.g., ‘average’, ‘complete’, ‘median’, ‘centroid’, etc., see also the document of the hclust() function). The treeplot() function will cut the tree into several subtrees (specify by the nCluster parameter (default is 5)) and labels subtrees using high-frequency words. This will reduce the complexity of the enriched result and improve user interpretation ability.

edox2 <- pairwise_termsim(edox)
p1 <- treeplot(edox2)
p2 <- treeplot(edox2, hclust_method = "average")
aplot::plot_list(p1, p2, tag_levels='A')

Figure 15.7: Heatmap plot of enriched terms. default (A), foldChange=geneList (B)

15.6 Enrichment Map

Enrichment map organizes enriched terms into a network with edges connecting overlapping gene sets. In this way, mutually overlapping gene sets are tend to cluster together, making it easy to identify functional module.

The emapplot function supports results obtained from hypergeometric test and gene set enrichment analysis. The cex_category parameter can be used to resize nodes, as demonstrated in Figure 15.8 B, and the layout parameter can adjust the layout, as demonstrated in Figure 15.8 C and D.

edo <- pairwise_termsim(edo)
p1 <- emapplot(edo)
p2 <- emapplot(edo, cex_category=1.5)
p3 <- emapplot(edo, layout="kk")
p4 <- emapplot(edo, cex_category=1.5,layout="kk") 
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])

Figure 15.8: Plot for results obtained from hypergeometric test and gene set enrichment analysis. default (A), cex_category=1.5 (B), layout="kk" (C) and cex_category=1.5,layout="kk" (D).

15.7 Biological theme comparison

The emapplot function also supports results obtained from compareCluster function of clusterProfiler package. In addition to cex_category and layout parameters, the number of circles in the bottom left corner can be adjusted using the legend_n parameteras, as demonstrated in Figure 15.9 B. And proportion of clusters in the pie chart can be adjusted using the pie parameter, when pie="count", the proportion of clusters in the pie chart is determined by the number of genes, as demonstrated in Figure 15.9 C and D.

library(clusterProfiler)
data(gcSample)
xx <- compareCluster(gcSample, fun="enrichKEGG",
                     organism="hsa", pvalueCutoff=0.05)
xx <- pairwise_termsim(xx)                     
p1 <- emapplot(xx)
p2 <- emapplot(xx, legend_n=2) 
p3 <- emapplot(xx, pie="count")
p4 <- emapplot(xx, pie="count", cex_category=1.5, layout="kk")
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])

Figure 15.9: Plot for results obtained from compareCluster function of clusterProfiler package. default (A), legend_n=2 (B), pie="count" (C) and pie="count", cex_category=1.5, layout="kk" (D).

15.8 UpSet Plot

The upsetplot is an alternative to cnetplot for visualizing the complex association between genes and gene sets. It emphasizes the gene overlapping among different gene sets.

upsetplot(edo)

Figure 15.10: Upsetplot for over-representation analysis.

For over-representation analysis, upsetplot will calculate the overlaps among different gene sets as demonstrated in Figure 15.10. For GSEA result, it will plot the fold change distributions of different categories (e.g. unique to pathway, overlaps among different pathways).

upsetplot(kk2)

Figure 15.11: Upsetplot for gene set enrichment analysis.

15.9 ridgeline plot for expression distribution of GSEA result

The ridgeplot will visualize expression distributions of core enriched genes for GSEA enriched categories. It helps users to interpret up/down-regulated pathways.

ridgeplot(edo2)

Figure 15.12: Ridgeplot for gene set enrichment analysis.

15.10 running score and preranked list of GSEA result

Running score and preranked list are traditional methods for visualizing GSEA result. The enrichplot package supports both of them to visualize the distribution of the gene set and the enrichment score.

p1 <- gseaplot(edo2, geneSetID = 1, by = "runningScore", title = edo2$Description[1])
p2 <- gseaplot(edo2, geneSetID = 1, by = "preranked", title = edo2$Description[1])
p3 <- gseaplot(edo2, geneSetID = 1, title = edo2$Description[1])
cowplot::plot_grid(p1, p2, p3, ncol=1, labels=LETTERS[1:3])

Figure 15.13: gseaplot for GSEA result(by = "runningScore"). by = "runningScore" (A), by = "preranked" (B), default (C)

Another method to plot GSEA result is the gseaplot2 function:

gseaplot2(edo2, geneSetID = 1, title = edo2$Description[1])

Figure 15.14: Gseaplot2 for GSEA result.

The gseaplot2 also supports multile gene sets to be displayed on the same figure:

gseaplot2(edo2, geneSetID = 1:3)

Figure 15.15: Gseaplot2 for GSEA result of multile gene sets.

User can also displaying the pvalue table on the plot via pvalue_table parameter:

gseaplot2(edo2, geneSetID = 1:3, pvalue_table = TRUE,
          color = c("#E495A5", "#86B875", "#7DB0DD"), ES_geom = "dot")

Figure 15.16: Gseaplot2 for GSEA result of multile gene sets(add pvalue_table).

User can specify subplots to only display a subset of plots:

p1 <- gseaplot2(edo2, geneSetID = 1:3, subplots = 1)
p2 <- gseaplot2(edo2, geneSetID = 1:3, subplots = 1:2)
cowplot::plot_grid(p1, p2, ncol=1, labels=LETTERS[1:2])

Figure 15.17: Gseaplot2 for GSEA result of multile gene sets(add subplots). subplots = 1 (A),subplots = 1:2 (B)

The gsearank function plot the ranked list of genes belong to the specific gene set.

gsearank(edo2, 1, title = edo2[1, "Description"])

Figure 15.18: Ranked list of genes belong to the specific gene set.

Multiple gene sets can be aligned using cowplot:

library(ggplot2)
library(cowplot)

pp <- lapply(1:3, function(i) {
    anno <- edo2[i, c("NES", "pvalue", "p.adjust")]
    lab <- paste0(names(anno), "=",  round(anno, 3), collapse="\n")

    gsearank(edo2, i, edo2[i, 2]) + xlab(NULL) +ylab(NULL) +
        annotate("text", 10000, edo2[i, "enrichmentScore"] * .75, label = lab, hjust=0, vjust=0)
})
plot_grid(plotlist=pp, ncol=1)

Figure 15.19: Gsearank for multiple gene sets.

15.11 pubmed trend of enriched terms

One of the problem of enrichment analysis is to find pathways for further investigation. Here, we provide pmcplot function to plot the number/proportion of publications trend based on the query result from PubMed Central. Of course, users can use pmcplot in other scenarios. All text that can be queried on PMC is valid as input of pmcplot.

terms <- edo$Description[1:5]
p <- pmcplot(terms, 2010:2020)
p2 <- pmcplot(terms, 2010:2020, proportion=FALSE)
plot_grid(p, p2, ncol=2)

Figure 15.20: Pmcplot of enrichment analysis.

14 Biological theme comparison

16 dplyr verbs for manipulating enrichment result