5  KEGG enrichment analysis

The KEGG FTP service has not been freely available for academic use since 2012, and there are many software packages using outdated KEGG annotation data. The clusterProfiler package supports downloading the latest online version of KEGG data using the KEGG website, which is freely available for academic users. Both KEGG pathways and modules are supported in clusterProfiler.

5.1 Supported organisms

The clusterProfiler package supports all organisms that have KEGG annotation data available in the KEGG database. Users should pass an abbreviation of academic name to the organism parameter. The full list of KEGG supported organisms can be accessed via http://www.genome.jp/kegg/catalog/org_list.html. KEGG Orthology (KO) Database is also supported by specifying organism = "ko".

The clusterProfiler package provides the search_kegg_organism() function to help search for supported organisms.

library(clusterProfiler)
search_kegg_organism('ece', by='kegg_code')
     kegg_code                 scientific_name          common_name
1288       ece Escherichia coli O157:H7 EDL933                 EHEC
8269      ecec            Enterococcus cecorum Enterococcus cecorum
ecoli <- search_kegg_organism('Escherichia coli', by='scientific_name')
dim(ecoli)
[1] 66  3
head(ecoli)
     kegg_code              scientific_name                  common_name
1282       eco Escherichia coli K-12 MG1655 Escherichia coli K-12 MG1655
1283       ecj  Escherichia coli K-12 W3110  Escherichia coli K-12 W3110
1284       ecd  Escherichia coli K-12 DH10B  Escherichia coli K-12 DH10B
1285       ebw Escherichia coli K-12 BW2952 Escherichia coli K-12 BW2952
1286      ecok  Escherichia coli K-12 MDS42  Escherichia coli K-12 MDS42
1287      ecoc  Escherichia coli K-12 C3026  Escherichia coli K-12 C3026

5.2 KEGG pathway over-representation analysis

data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]

kk <- enrichKEGG(gene         = gene,
                 organism     = 'hsa',
                 pvalueCutoff = 0.05)
head(kk)
                                     category
hsa04110                   Cellular Processes
hsa04114                   Cellular Processes
hsa04218                   Cellular Processes
hsa04061 Environmental Information Processing
hsa03320                   Organismal Systems
hsa04814                   Cellular Processes
                                 subcategory       ID
hsa04110               Cell growth and death hsa04110
hsa04114               Cell growth and death hsa04114
hsa04218               Cell growth and death hsa04218
hsa04061 Signaling molecules and interaction hsa04061
hsa03320                    Endocrine system hsa03320
hsa04814                       Cell motility hsa04814
                                                           Description
hsa04110                                                    Cell cycle
hsa04114                                                Oocyte meiosis
hsa04218                                           Cellular senescence
hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320                                        PPAR signaling pathway
hsa04814                                                Motor proteins
         GeneRatio  BgRatio RichFactor FoldEnrichment    zScore       pvalue
hsa04110    15/107 158/9522 0.09493671       8.448480 10.064414 2.064405e-10
hsa04114    10/107 139/9522 0.07194245       6.402205  6.839599 3.346185e-06
hsa04218    10/107 157/9522 0.06369427       5.668195  6.287354 9.939099e-06
hsa04061     8/107 100/9522 0.08000000       7.119252  6.557674 1.545425e-05
hsa03320     7/107  76/9522 0.09210526       8.196508  6.714716 2.147577e-05
hsa04814    10/107 197/9522 0.05076142       4.517292  5.317900 7.035033e-05
             p.adjust qvalue
hsa04110 7.204774e-08     NA
hsa04114 5.839093e-04     NA
hsa04218 1.156249e-03     NA
hsa04061 1.348383e-03     NA
hsa03320 1.499009e-03     NA
hsa04814 4.092044e-03     NA
                                                                           geneID
hsa04110 890/891/9212/7272/9133/9319/4085/81620/991/10403/1111/983/9232/4174/8318
hsa04114                          891/3708/51806/9133/4085/6790/5241/991/983/9232
hsa04218                           890/891/776/2305/3708/51806/9133/4605/1111/983
hsa04061                                 6373/6355/6362/1524/10563/3627/9547/4283
hsa03320                                       9415/5105/5346/3158/2167/4312/9370
hsa04814                   10112/9493/81930/146909/24137/4629/3833/7802/1062/3832
         Count
hsa04110    15
hsa04114    10
hsa04218    10
hsa04061     8
hsa03320     7
hsa04814    10

Input ID type can be kegg, ncbi-geneid, ncbi-proteinid or uniprot (see also Section 17.1.2). Unlike enrichGO(), there is no readable parameter for enrichKEGG(). However, users can use the setReadable() function, see also Section 17.2, if there is an OrgDb available for the species.

5.3 KEGG pathway gene set enrichment analysis

kk2 <- gseKEGG(geneList     = geneList,
               organism     = 'hsa',
               minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
head(kk2)
               ID                             Description setSize
hsa04110 hsa04110                              Cell cycle     139
hsa05169 hsa05169            Epstein-Barr virus infection     195
hsa04613 hsa04613 Neutrophil extracellular trap formation     130
hsa04218 hsa04218                     Cellular senescence     141
hsa05166 hsa05166 Human T-cell leukemia virus 1 infection     203
hsa04114 hsa04114                          Oocyte meiosis      96
         enrichmentScore      NES       pvalue     p.adjust       qvalue rank
hsa04110       0.6637551 2.864372 3.114381e-12 3.363532e-10 3.363532e-10 1155
hsa05169       0.4335116 1.931224 1.141044e-07 6.161637e-06 6.161637e-06 2820
hsa04613       0.4496570 1.916069 5.984032e-06 2.154252e-04 2.154252e-04 2575
hsa04218       0.4115945 1.755957 4.163499e-05 8.993158e-04 8.993158e-04 1155
hsa05166       0.3882568 1.750063 1.555160e-05 4.198931e-04 4.198931e-04 1955
hsa04114       0.4396853 1.748956 1.267640e-04 1.413706e-03 1.413706e-03  896
                           leading_edge
hsa04110  tags=36%, list=9%, signal=33%
hsa05169 tags=39%, list=23%, signal=31%
hsa04613 tags=37%, list=21%, signal=30%
hsa04218  tags=17%, list=9%, signal=16%
hsa05166 tags=26%, list=16%, signal=22%
hsa04114  tags=20%, list=7%, signal=19%
                                                                                                                                                                                                                                                                                                                                                                                     core_enrichment
hsa04110                                                                                                                                        8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/10926/6502/994/699/4609/5111/26271/1869/1029/8317/4176/2810/3066/1871/1031/9088/995/1019/4172/5885/11200/7027/1875
hsa05169 3627/890/6890/9636/898/9134/6502/6772/3126/3112/4609/917/5709/1869/3654/919/915/4067/4938/864/4940/5713/5336/11047/3066/54205/1871/578/1019/637/916/3383/4939/10213/23586/4793/5603/7979/7128/6891/930/5714/3452/6850/5702/4794/7124/3569/7097/5708/2208/8772/3119/5704/7186/5971/3135/1380/958/5610/55080/4792/10018/8819/3134/10379/9641/1147/5718/6300/3109/811/5606/2923/3108/5707/1432
hsa04613                                                                                                                                            820/366/51311/64581/3015/85236/55506/8970/8357/1535/2359/5336/4688/92815/3066/8336/292/1991/3689/8345/5603/4689/5880/10105/1184/6404/3018/6850/5604/3014/7097/1378/8290/1536/834/5605/1183/728/2215/8335/5594/9734/3674/5578/5582/7417/8331/6300
hsa04218                                                                                                                                                                                                                                                                       2305/4605/9133/890/983/51806/1111/891/993/3576/1978/898/9134/4609/1869/1029/22808/1871/5499/91860/292/1019/11200/1875
hsa05166                                                                                                                            991/9133/890/4085/7850/1111/9232/8061/701/9700/898/4316/9134/3932/3559/3126/3112/4609/3561/917/1869/1029/915/114/2005/5902/55697/1871/1031/2224/292/1019/3689/916/3383/11200/706/3600/6513/3601/468/5604/7124/1030/3569/4049/4055/10393/3119/5901/5971/1959/3135
hsa04114                                                                                                                                                                                                                                                                                                  991/9133/983/4085/51806/6790/891/9232/5347/9700/898/9134/699/26271/114/5499/91860/9088/995

5.4 KEGG module over-representation analysis

KEGG Module is a collection of manually defined functional units. In some situations, KEGG Modules have a more straightforward interpretation.

mkk <- enrichMKEGG(gene = gene,
                   organism = 'hsa',
                   pvalueCutoff = 1,
                   qvalueCutoff = 1)
head(mkk)                   
           ID
M00912 M00912
M00095 M00095
M00053 M00053
M00938 M00938
M00003 M00003
M00049 M00049
                                                                    Description
M00912                       NAD biosynthesis, tryptophan => quinolinate => NAD
M00095                           C5 isoprenoid biosynthesis, mevalonate pathway
M00053 Deoxyribonucleotide biosynthesis, ADP/GDP/CDP/UDP => dATP/dGTP/dCTP/dUTP
M00938                 Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP
M00003                             Gluconeogenesis, oxaloacetate => fructose-6P
M00049                      Adenine ribonucleotide biosynthesis, IMP => ADP,ATP
       GeneRatio BgRatio RichFactor FoldEnrichment   zScore      pvalue
M00912       2/9  12/833 0.16666667      15.425926 5.257534 0.006480816
M00095       1/9  10/833 0.10000000       9.255556 2.743252 0.103471964
M00053       1/9  11/833 0.09090909       8.414141 2.585477 0.113276038
M00938       1/9  14/833 0.07142857       6.611111 2.211517 0.142119005
M00003       1/9  18/833 0.05555556       5.141975 1.855600 0.179278025
M00049       1/9  19/833 0.05263158       4.871345 1.782977 0.188341212
        p.adjust    qvalue     geneID Count
M00912 0.2851559 0.2851559 23475/3620     2
M00095 1.0000000 1.0000000       3158     1
M00053 1.0000000 1.0000000       6241     1
M00938 1.0000000 1.0000000       6241     1
M00003 1.0000000 1.0000000       5105     1
M00049 1.0000000 1.0000000      26289     1

5.5 KEGG module gene set enrichment analysis

mkk2 <- gseMKEGG(geneList = geneList,
                 organism = 'hsa',
                 pvalueCutoff = 1)
head(mkk2)
           ID                                               Description setSize
M00912 M00912        NAD biosynthesis, tryptophan => quinolinate => NAD       8
M00004 M00004       Pentose phosphate pathway (Pentose phosphate cycle)       9
M00001 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate      24
M00002 M00002  Glycolysis, core module involving three-carbon compounds      11
M00099 M00099                                  Sphingosine biosynthesis       9
M00938 M00938  Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP      10
       enrichmentScore       NES       pvalue   p.adjust     qvalue rank
M00912       0.8509100  1.929767 0.0005671893 0.02495633 0.01353849  679
M00004       0.7450305  1.726809 0.0160356422 0.23518942 0.12758731 1750
M00001       0.5739035  1.709942 0.0086966318 0.19132590 0.10379190 2886
M00002       0.6421781  1.587073 0.0415538128 0.34826024 0.18892681 1381
M00099      -0.7084736 -1.576682 0.0298160812 0.32797689 0.17792335 2125
M00938       0.6648002  1.553363 0.0554050386 0.34826024 0.18892681  648
                         leading_edge
M00912  tags=62%, list=5%, signal=59%
M00004 tags=67%, list=14%, signal=57%
M00001 tags=54%, list=23%, signal=42%
M00002 tags=55%, list=11%, signal=49%
M00099 tags=56%, list=17%, signal=46%
M00938  tags=40%, list=5%, signal=38%
                                                        core_enrichment
M00912                                        23475/3620/6999/8564/8942
M00004                                   2821/5226/7086/2539/6888/22934
M00001 5214/3101/2821/7167/2597/5230/2023/5223/5315/3099/5232/2027/5211
M00002                                    7167/2597/5230/2023/5223/5315
M00099                                     253782/29956/427/55304/79603
M00938                                              6241/7298/4830/1841

5.6 Visualize enriched KEGG pathways

The enrichplot package implements several methods to visualize enriched terms. Most of them are general methods that can be used on GO, KEGG, MSigDb, and other gene set annotations. Here, we introduce the clusterProfiler::browseKEGG() and pathview::pathview() functions to help users explore enriched KEGG pathways with genes of interest.

To view the KEGG pathway, users can use the browseKEGG() function, which will open a web browser and highlight enriched genes.

browseKEGG(kk, 'hsa04110')

(ref:browseKEGGscap) Explore selected KEGG pathway.

(ref:browseKEGGcap) Explore selected KEGG pathway. Differentially expressed genes that are enriched in the selected pathway will be highlighted.

(ref:browseKEGGcap)

Users can also use the pathview() function from the pathview (Luo and Brouwer 2013) to visualize enriched KEGG pathways identified by the clusterProfiler package (Yu et al. 2012).

The following example illustrates how to visualize the “hsa04110” pathway, which was enriched in our previous analysis.

library("pathview")
hsa04110 <- pathview(gene.data  = geneList,
                     pathway.id = "hsa04110",
                     species    = "hsa",
                     limit      = list(gene=max(abs(geneList)), cpd=1))

(ref:pathviewscap) Visualize selected KEGG pathway by pathview().

(ref:pathviewcap) Visualize selected KEGG pathway by pathview(). Gene expression values can be mapped to gradient color scale.

(ref:pathviewcap)

References

Luo, Weijun, and Cory Brouwer. 2013. “Pathview: An R/Bioconductor Package for Pathway-Based Data Integration and Visualization.” Bioinformatics 29 (July): 1830–31. https://doi.org/10.1093/bioinformatics/btt285.
Yu, Guangchuang, Le-Gen Wang, Yanyan Han, and Qing-Yu He. 2012. “clusterProfiler: An r Package for Comparing Biological Themes Among Gene Clusters.” OMICS: A Journal of Integrative Biology 16 (5): 284–87. https://doi.org/10.1089/omi.2011.0118.