5  KEGG enrichment analysis

The KEGG FTP service is not freely available for academic use since 2012, and there are many software packages using out-dated KEGG annotation data. The clusterProfiler package supports downloading the latest online version of KEGG data using the KEGG website, which is freely available for academic users. Both the KEGG pathway and module are supported in clusterProfiler.

5.1 Supported organisms

The clusterProfiler package supports all organisms that have KEGG annotation data available in the KEGG database. Users should pass an abbreviation of academic name to the organism parameter. The full list of KEGG supported organisms can be accessed via http://www.genome.jp/kegg/catalog/org_list.html. KEGG Orthology (KO) Database is also supported by specifying organism = "ko".

The clusterProfiler package provides search_kegg_organism() function to help searching supported organisms.

     kegg_code                 scientific_name          common_name
1259       ece Escherichia coli O157:H7 EDL933                 EHEC
8132      ecec            Enterococcus cecorum Enterococcus cecorum
ecoli <- search_kegg_organism('Escherichia coli', by='scientific_name')
dim(ecoli)
[1] 66  3
head(ecoli)
     kegg_code              scientific_name                  common_name
1253       eco Escherichia coli K-12 MG1655 Escherichia coli K-12 MG1655
1254       ecj  Escherichia coli K-12 W3110  Escherichia coli K-12 W3110
1255       ecd  Escherichia coli K-12 DH10B  Escherichia coli K-12 DH10B
1256       ebw Escherichia coli K-12 BW2952 Escherichia coli K-12 BW2952
1257      ecok  Escherichia coli K-12 MDS42  Escherichia coli K-12 MDS42
1258      ecoc  Escherichia coli K-12 C3026  Escherichia coli K-12 C3026

5.2 KEGG pathway over-representation analysis

data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]

kk <- enrichKEGG(gene         = gene,
                 organism     = 'hsa',
                 pvalueCutoff = 0.05)
head(kk)
                                     category
hsa04110                   Cellular Processes
hsa04114                   Cellular Processes
hsa04218                   Cellular Processes
hsa04061 Environmental Information Processing
hsa03320                   Organismal Systems
hsa04814                   Cellular Processes
                                 subcategory       ID
hsa04110               Cell growth and death hsa04110
hsa04114               Cell growth and death hsa04114
hsa04218               Cell growth and death hsa04218
hsa04061 Signaling molecules and interaction hsa04061
hsa03320                    Endocrine system hsa03320
hsa04814                       Cell motility hsa04814
                                                           Description
hsa04110                                                    Cell cycle
hsa04114                                                Oocyte meiosis
hsa04218                                           Cellular senescence
hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320                                        PPAR signaling pathway
hsa04814                                                Motor proteins
         GeneRatio  BgRatio RichFactor FoldEnrichment    zScore       pvalue
hsa04110    15/107 158/9500 0.09493671       8.428960 10.049983 2.130809e-10
hsa04114    10/107 139/9500 0.07194245       6.387413  6.828971 3.414945e-06
hsa04218    10/107 157/9500 0.06369427       5.655099  6.277174 1.013937e-05
hsa04061     8/107 100/9500 0.08000000       7.102804  6.547781 1.571178e-05
hsa03320     7/107  76/9500 0.09210526       8.177570  6.704946 2.179422e-05
hsa04814    10/107 197/9500 0.05076142       4.506855  5.308455 7.170565e-05
             p.adjust       qvalue
hsa04110 4.687780e-08 4.553202e-08
hsa04114 3.756439e-04 3.648599e-04
hsa04218 7.435539e-04 7.222078e-04
hsa04061 8.641481e-04 8.393400e-04
hsa03320 9.589457e-04 9.314162e-04
hsa04814 2.629207e-03 2.553728e-03
                                                                           geneID
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232
hsa04114                          991/9133/983/4085/51806/6790/891/9232/3708/5241
hsa04218                           2305/4605/9133/890/983/51806/1111/891/776/3708
hsa04061                                 3627/10563/6373/4283/6362/6355/9547/1524
hsa03320                                       4312/9415/9370/5105/2167/3158/5346
hsa04814                   9493/1062/81930/3832/3833/146909/10112/24137/4629/7802
         Count
hsa04110    15
hsa04114    10
hsa04218    10
hsa04061     8
hsa03320     7
hsa04814    10

Input ID type can be kegg, ncbi-geneid, ncbi-proteinid or uniprot (see also session 16.1.2). Unlike enrichGO(), there is no readable parameter for enrichKEGG(). However, users can use the setReadable() function if there is an OrgDb available for the species.

5.3 KEGG pathway gene set enrichment analysis

kk2 <- gseKEGG(geneList     = geneList,
               organism     = 'hsa',
               minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
head(kk2)
               ID                             Description setSize
hsa04110 hsa04110                              Cell cycle     139
hsa05169 hsa05169            Epstein-Barr virus infection     195
hsa04613 hsa04613 Neutrophil extracellular trap formation     130
hsa05166 hsa05166 Human T-cell leukemia virus 1 infection     203
hsa04510 hsa04510                          Focal adhesion     191
hsa04218 hsa04218                     Cellular senescence     141
         enrichmentScore       NES       pvalue     p.adjust       qvalue rank
hsa04110       0.6637551  2.836593 1.000000e-10 8.600000e-09 5.684211e-09 1155
hsa05169       0.4335116  1.922965 1.338944e-07 5.757461e-06 3.805421e-06 2820
hsa04613       0.4496569  1.899232 5.703907e-06 1.635120e-04 1.080740e-04 2575
hsa05166       0.3882567  1.740874 9.483009e-06 2.038847e-04 1.347585e-04 1955
hsa04510      -0.4199193 -1.723726 2.880366e-05 4.954229e-04 3.274521e-04 2183
hsa04218       0.4115945  1.761108 5.896362e-05 8.451453e-04 5.586027e-04 1155
                           leading_edge
hsa04110  tags=36%, list=9%, signal=33%
hsa05169 tags=39%, list=23%, signal=31%
hsa04613 tags=37%, list=21%, signal=30%
hsa05166 tags=26%, list=16%, signal=22%
hsa04510 tags=27%, list=17%, signal=23%
hsa04218  tags=17%, list=9%, signal=16%
                                                                                                                                                                                                                                                                                                                                                                                     core_enrichment
hsa04110                                                                                                                                        8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/10926/6502/994/699/4609/5111/26271/1869/1029/8317/4176/2810/3066/1871/1031/9088/995/1019/4172/5885/11200/7027/1875
hsa05169 3627/890/6890/9636/898/9134/6502/6772/3126/3112/4609/917/5709/1869/3654/919/915/4067/4938/864/4940/5713/5336/11047/3066/54205/1871/578/1019/637/916/3383/4939/10213/23586/4793/5603/7979/7128/6891/930/5714/3452/6850/5702/4794/7124/3569/7097/5708/2208/8772/3119/5704/7186/5971/3135/1380/958/5610/55080/4792/10018/8819/3134/10379/9641/1147/5718/6300/3109/811/5606/2923/3108/5707/1432
hsa04613                                                                                                                                            820/366/51311/64581/3015/85236/55506/8970/8357/1535/2359/5336/4688/92815/3066/8336/292/1991/3689/8345/5603/4689/5880/10105/1184/6404/3018/6850/5604/3014/7097/1378/8290/1536/834/5605/1183/728/2215/8335/5594/9734/3674/5578/5582/7417/8331/6300
hsa05166                                                                                                                            991/9133/890/4085/7850/1111/9232/8061/701/9700/898/4316/9134/3932/3559/3126/3112/4609/3561/917/1869/1029/915/114/2005/5902/55697/1871/1031/2224/292/1019/3689/916/3383/11200/706/3600/6513/3601/468/5604/7124/1030/3569/4049/4055/10393/3119/5901/5971/1959/3135
hsa04510                                                                                                                    5595/5228/7424/1499/4636/83660/2013/7059/5295/1288/23396/3910/3371/3082/1291/394/3791/7450/596/3685/1280/3675/595/3912/1793/2012/1278/1277/1293/10398/55742/2317/7058/25759/56034/3693/3480/5159/857/1292/3908/3909/63923/3913/1287/3679/7060/3479/10451/80310/1311/1101
hsa04218                                                                                                                                                                                                                                                                       2305/4605/9133/890/983/51806/1111/891/993/3576/1978/898/9134/4609/1869/1029/22808/1871/5499/91860/292/1019/11200/1875

5.4 KEGG module over-representation analysis

KEGG Module is a collection of manually defined function units. In some situation, KEGG Modules have a more straightforward interpretation.

mkk <- enrichMKEGG(gene = gene,
                   organism = 'hsa',
                   pvalueCutoff = 1,
                   qvalueCutoff = 1)
head(mkk)                   
           ID
M00912 M00912
M00095 M00095
M00053 M00053
M00938 M00938
M00003 M00003
M00049 M00049
                                                                    Description
M00912                       NAD biosynthesis, tryptophan => quinolinate => NAD
M00095                           C5 isoprenoid biosynthesis, mevalonate pathway
M00053 Deoxyribonucleotide biosynthesis, ADP/GDP/CDP/UDP => dATP/dGTP/dCTP/dUTP
M00938                 Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP
M00003                             Gluconeogenesis, oxaloacetate => fructose-6P
M00049                      Adenine ribonucleotide biosynthesis, IMP => ADP,ATP
       GeneRatio BgRatio RichFactor FoldEnrichment   zScore      pvalue
M00912       2/9  12/833 0.16666667      15.425926 5.257534 0.006480816
M00095       1/9  10/833 0.10000000       9.255556 2.743252 0.103471964
M00053       1/9  11/833 0.09090909       8.414141 2.585477 0.113276038
M00938       1/9  14/833 0.07142857       6.611111 2.211517 0.142119005
M00003       1/9  18/833 0.05555556       5.141975 1.855600 0.179278025
M00049       1/9  19/833 0.05263158       4.871345 1.782977 0.188341212
        p.adjust     qvalue     geneID Count
M00912 0.0388849 0.03410956 23475/3620     2
M00095 0.1883412 0.16521159       3158     1
M00053 0.1883412 0.16521159       6241     1
M00938 0.1883412 0.16521159       6241     1
M00003 0.1883412 0.16521159       5105     1
M00049 0.1883412 0.16521159      26289     1

5.5 KEGG module gene set enrichment analysis

mkk2 <- gseMKEGG(geneList = geneList,
                 organism = 'hsa',
                 pvalueCutoff = 1)
head(mkk2)
           ID                                               Description setSize
M00001 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate      24
M00938 M00938  Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP      10
M00002 M00002  Glycolysis, core module involving three-carbon compounds      11
M00035 M00035                                    Methionine degradation      11
M00009 M00009                    Citrate cycle (TCA cycle, Krebs cycle)      22
M00101 M00101              Cholesterol biosynthesis, FPP => cholesterol      12
       enrichmentScore      NES      pvalue  p.adjust    qvalue rank
M00001       0.5739036 1.751541 0.009216818 0.2765045 0.2619517 2886
M00938       0.6648004 1.575131 0.032927072 0.3559954 0.3372588  648
M00002       0.6421781 1.562671 0.035599545 0.3559954 0.3372588 1381
M00035       0.5824213 1.417259 0.094252874 0.6698337 0.6345793 1555
M00009       0.4504911 1.331955 0.111638955 0.6698337 0.6345793 3514
M00101       0.5350938 1.330421 0.145739910 0.6706064 0.6353113 1180
                         leading_edge
M00001 tags=54%, list=23%, signal=42%
M00938  tags=40%, list=5%, signal=38%
M00002 tags=55%, list=11%, signal=49%
M00035 tags=45%, list=12%, signal=40%
M00009 tags=50%, list=28%, signal=36%
M00101  tags=42%, list=9%, signal=38%
                                                        core_enrichment
M00001 5214/3101/2821/7167/2597/5230/2023/5223/5315/3099/5232/2027/5211
M00938                                              6241/7298/4830/1841
M00002                                    7167/2597/5230/2023/5223/5315
M00035                                           875/1789/191/1788/1786
M00009            3418/50/4190/3419/2271/3421/55753/3417/1431/6389/4191
M00101                                       1717/6713/3930/10682/50814

5.6 Visualize enriched KEGG pathways

The enrichplot package implements several methods to visualize enriched terms. Most of them are general methods that can be used on GO, KEGG, MSigDb, and other gene set annotations. Here, we introduce the clusterProfiler::browseKEGG() and pathview::pathview() functions to help users explore enriched KEGG pathways with genes of interest.

To view the KEGG pathway, users can use the browseKEGG function, which will open a web browser and highlight enriched genes.

browseKEGG(kk, 'hsa04110')

(ref:browseKEGGscap) Explore selected KEGG pathway.

(ref:browseKEGGcap) Explore selected KEGG pathway. Differentially expressed genes that are enriched in the selected pathway will be highlighted.

(ref:browseKEGGcap)

Users can also use the pathview() function from the pathview (Luo and Brouwer 2013) to visualize enriched KEGG pathways identified by the clusterProfiler package (Yu et al. 2012).

The following example illustrates how to visualize the “hsa04110” pathway, which was enriched in our previous analysis.

library("pathview")
hsa04110 <- pathview(gene.data  = geneList,
                     pathway.id = "hsa04110",
                     species    = "hsa",
                     limit      = list(gene=max(abs(geneList)), cpd=1))

(ref:pathviewscap) Visualze selected KEGG pathway by pathview().

(ref:pathviewcap) Visualize selected KEGG pathway by pathview(). Gene expression values can be mapped to gradient color scale.

(ref:pathviewcap)

References

Luo, Weijun, and Cory Brouwer. 2013. “Pathview: An R/Bioconductor Package for Pathway-Based Data Integration and Visualization.” Bioinformatics 29 (July): 1830–31. https://doi.org/10.1093/bioinformatics/btt285.
Yu, Guangchuang, Le-Gen Wang, Yanyan Han, and Qing-Yu He. 2012. “clusterProfiler: An r Package for Comparing Biological Themes Among Gene Clusters.” OMICS: A Journal of Integrative Biology 16 (5): 284–87. https://doi.org/10.1089/omi.2011.0118.