The KEGG FTP service is not freely available for academic use since 2012, and there are many software packages using out-dated KEGG annotation data. The clusterProfiler package supports downloading the latest online version of KEGG data using the KEGG website , which is freely available for academic users. Both the KEGG pathway and module are supported in clusterProfiler .
Supported organisms
The clusterProfiler package supports all organisms that have KEGG annotation data available in the KEGG database. Users should pass an abbreviation of academic name to the organism parameter. The full list of KEGG supported organisms can be accessed via http://www.genome.jp/kegg/catalog/org_list.html . KEGG Orthology (KO) Database is also supported by specifying organism = "ko".
The clusterProfiler package provides search_kegg_organism() function to help searching supported organisms.
kegg_code scientific_name common_name
1259 ece Escherichia coli O157:H7 EDL933 EHEC
8132 ecec Enterococcus cecorum Enterococcus cecorum
kegg_code scientific_name common_name
1253 eco Escherichia coli K-12 MG1655 Escherichia coli K-12 MG1655
1254 ecj Escherichia coli K-12 W3110 Escherichia coli K-12 W3110
1255 ecd Escherichia coli K-12 DH10B Escherichia coli K-12 DH10B
1256 ebw Escherichia coli K-12 BW2952 Escherichia coli K-12 BW2952
1257 ecok Escherichia coli K-12 MDS42 Escherichia coli K-12 MDS42
1258 ecoc Escherichia coli K-12 C3026 Escherichia coli K-12 C3026
KEGG pathway over-representation analysis
data ( geneList , package= "DOSE" )
gene <- names ( geneList ) [ abs ( geneList ) > 2 ]
kk <- enrichKEGG ( gene = gene ,
organism = 'hsa' ,
pvalueCutoff = 0.05 )
head ( kk )
category
hsa04110 Cellular Processes
hsa04114 Cellular Processes
hsa04218 Cellular Processes
hsa04061 Environmental Information Processing
hsa03320 Organismal Systems
hsa04814 Cellular Processes
subcategory ID
hsa04110 Cell growth and death hsa04110
hsa04114 Cell growth and death hsa04114
hsa04218 Cell growth and death hsa04218
hsa04061 Signaling molecules and interaction hsa04061
hsa03320 Endocrine system hsa03320
hsa04814 Cell motility hsa04814
Description
hsa04110 Cell cycle
hsa04114 Oocyte meiosis
hsa04218 Cellular senescence
hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320 PPAR signaling pathway
hsa04814 Motor proteins
GeneRatio BgRatio RichFactor FoldEnrichment zScore pvalue
hsa04110 15/107 158/9500 0.09493671 8.428960 10.049983 2.130809e-10
hsa04114 10/107 139/9500 0.07194245 6.387413 6.828971 3.414945e-06
hsa04218 10/107 157/9500 0.06369427 5.655099 6.277174 1.013937e-05
hsa04061 8/107 100/9500 0.08000000 7.102804 6.547781 1.571178e-05
hsa03320 7/107 76/9500 0.09210526 8.177570 6.704946 2.179422e-05
hsa04814 10/107 197/9500 0.05076142 4.506855 5.308455 7.170565e-05
p.adjust qvalue
hsa04110 4.687780e-08 4.553202e-08
hsa04114 3.756439e-04 3.648599e-04
hsa04218 7.435539e-04 7.222078e-04
hsa04061 8.641481e-04 8.393400e-04
hsa03320 9.589457e-04 9.314162e-04
hsa04814 2.629207e-03 2.553728e-03
geneID
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232
hsa04114 991/9133/983/4085/51806/6790/891/9232/3708/5241
hsa04218 2305/4605/9133/890/983/51806/1111/891/776/3708
hsa04061 3627/10563/6373/4283/6362/6355/9547/1524
hsa03320 4312/9415/9370/5105/2167/3158/5346
hsa04814 9493/1062/81930/3832/3833/146909/10112/24137/4629/7802
Count
hsa04110 15
hsa04114 10
hsa04218 10
hsa04061 8
hsa03320 7
hsa04814 10
Input ID type can be kegg, ncbi-geneid, ncbi-proteinid or uniprot (see also session 16.1.2 ). Unlike enrichGO() , there is no readable parameter for enrichKEGG() . However, users can use the setReadable() function if there is an OrgDb available for the species.
KEGG pathway gene set enrichment analysis
kk2 <- gseKEGG ( geneList = geneList ,
organism = 'hsa' ,
minGSSize = 120 ,
pvalueCutoff = 0.05 ,
verbose = FALSE )
head ( kk2 )
ID Description setSize
hsa04110 hsa04110 Cell cycle 139
hsa05169 hsa05169 Epstein-Barr virus infection 195
hsa04613 hsa04613 Neutrophil extracellular trap formation 130
hsa05166 hsa05166 Human T-cell leukemia virus 1 infection 203
hsa04510 hsa04510 Focal adhesion 191
hsa04218 hsa04218 Cellular senescence 141
enrichmentScore NES pvalue p.adjust qvalue rank
hsa04110 0.6637551 2.836593 1.000000e-10 8.600000e-09 5.684211e-09 1155
hsa05169 0.4335116 1.922965 1.338944e-07 5.757461e-06 3.805421e-06 2820
hsa04613 0.4496569 1.899232 5.703907e-06 1.635120e-04 1.080740e-04 2575
hsa05166 0.3882567 1.740874 9.483009e-06 2.038847e-04 1.347585e-04 1955
hsa04510 -0.4199193 -1.723726 2.880366e-05 4.954229e-04 3.274521e-04 2183
hsa04218 0.4115945 1.761108 5.896362e-05 8.451453e-04 5.586027e-04 1155
leading_edge
hsa04110 tags=36%, list=9%, signal=33%
hsa05169 tags=39%, list=23%, signal=31%
hsa04613 tags=37%, list=21%, signal=30%
hsa05166 tags=26%, list=16%, signal=22%
hsa04510 tags=27%, list=17%, signal=23%
hsa04218 tags=17%, list=9%, signal=16%
core_enrichment
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/10926/6502/994/699/4609/5111/26271/1869/1029/8317/4176/2810/3066/1871/1031/9088/995/1019/4172/5885/11200/7027/1875
hsa05169 3627/890/6890/9636/898/9134/6502/6772/3126/3112/4609/917/5709/1869/3654/919/915/4067/4938/864/4940/5713/5336/11047/3066/54205/1871/578/1019/637/916/3383/4939/10213/23586/4793/5603/7979/7128/6891/930/5714/3452/6850/5702/4794/7124/3569/7097/5708/2208/8772/3119/5704/7186/5971/3135/1380/958/5610/55080/4792/10018/8819/3134/10379/9641/1147/5718/6300/3109/811/5606/2923/3108/5707/1432
hsa04613 820/366/51311/64581/3015/85236/55506/8970/8357/1535/2359/5336/4688/92815/3066/8336/292/1991/3689/8345/5603/4689/5880/10105/1184/6404/3018/6850/5604/3014/7097/1378/8290/1536/834/5605/1183/728/2215/8335/5594/9734/3674/5578/5582/7417/8331/6300
hsa05166 991/9133/890/4085/7850/1111/9232/8061/701/9700/898/4316/9134/3932/3559/3126/3112/4609/3561/917/1869/1029/915/114/2005/5902/55697/1871/1031/2224/292/1019/3689/916/3383/11200/706/3600/6513/3601/468/5604/7124/1030/3569/4049/4055/10393/3119/5901/5971/1959/3135
hsa04510 5595/5228/7424/1499/4636/83660/2013/7059/5295/1288/23396/3910/3371/3082/1291/394/3791/7450/596/3685/1280/3675/595/3912/1793/2012/1278/1277/1293/10398/55742/2317/7058/25759/56034/3693/3480/5159/857/1292/3908/3909/63923/3913/1287/3679/7060/3479/10451/80310/1311/1101
hsa04218 2305/4605/9133/890/983/51806/1111/891/993/3576/1978/898/9134/4609/1869/1029/22808/1871/5499/91860/292/1019/11200/1875
KEGG module over-representation analysis
KEGG Module is a collection of manually defined function units. In some situation, KEGG Modules have a more straightforward interpretation.
mkk <- enrichMKEGG ( gene = gene ,
organism = 'hsa' ,
pvalueCutoff = 1 ,
qvalueCutoff = 1 )
head ( mkk )
ID
M00912 M00912
M00095 M00095
M00053 M00053
M00938 M00938
M00003 M00003
M00049 M00049
Description
M00912 NAD biosynthesis, tryptophan => quinolinate => NAD
M00095 C5 isoprenoid biosynthesis, mevalonate pathway
M00053 Deoxyribonucleotide biosynthesis, ADP/GDP/CDP/UDP => dATP/dGTP/dCTP/dUTP
M00938 Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP
M00003 Gluconeogenesis, oxaloacetate => fructose-6P
M00049 Adenine ribonucleotide biosynthesis, IMP => ADP,ATP
GeneRatio BgRatio RichFactor FoldEnrichment zScore pvalue
M00912 2/9 12/833 0.16666667 15.425926 5.257534 0.006480816
M00095 1/9 10/833 0.10000000 9.255556 2.743252 0.103471964
M00053 1/9 11/833 0.09090909 8.414141 2.585477 0.113276038
M00938 1/9 14/833 0.07142857 6.611111 2.211517 0.142119005
M00003 1/9 18/833 0.05555556 5.141975 1.855600 0.179278025
M00049 1/9 19/833 0.05263158 4.871345 1.782977 0.188341212
p.adjust qvalue geneID Count
M00912 0.0388849 0.03410956 23475/3620 2
M00095 0.1883412 0.16521159 3158 1
M00053 0.1883412 0.16521159 6241 1
M00938 0.1883412 0.16521159 6241 1
M00003 0.1883412 0.16521159 5105 1
M00049 0.1883412 0.16521159 26289 1
KEGG module gene set enrichment analysis
mkk2 <- gseMKEGG ( geneList = geneList ,
organism = 'hsa' ,
pvalueCutoff = 1 )
head ( mkk2 )
ID Description setSize
M00001 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate 24
M00938 M00938 Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP 10
M00002 M00002 Glycolysis, core module involving three-carbon compounds 11
M00035 M00035 Methionine degradation 11
M00009 M00009 Citrate cycle (TCA cycle, Krebs cycle) 22
M00101 M00101 Cholesterol biosynthesis, FPP => cholesterol 12
enrichmentScore NES pvalue p.adjust qvalue rank
M00001 0.5739036 1.751541 0.009216818 0.2765045 0.2619517 2886
M00938 0.6648004 1.575131 0.032927072 0.3559954 0.3372588 648
M00002 0.6421781 1.562671 0.035599545 0.3559954 0.3372588 1381
M00035 0.5824213 1.417259 0.094252874 0.6698337 0.6345793 1555
M00009 0.4504911 1.331955 0.111638955 0.6698337 0.6345793 3514
M00101 0.5350938 1.330421 0.145739910 0.6706064 0.6353113 1180
leading_edge
M00001 tags=54%, list=23%, signal=42%
M00938 tags=40%, list=5%, signal=38%
M00002 tags=55%, list=11%, signal=49%
M00035 tags=45%, list=12%, signal=40%
M00009 tags=50%, list=28%, signal=36%
M00101 tags=42%, list=9%, signal=38%
core_enrichment
M00001 5214/3101/2821/7167/2597/5230/2023/5223/5315/3099/5232/2027/5211
M00938 6241/7298/4830/1841
M00002 7167/2597/5230/2023/5223/5315
M00035 875/1789/191/1788/1786
M00009 3418/50/4190/3419/2271/3421/55753/3417/1431/6389/4191
M00101 1717/6713/3930/10682/50814
Visualize enriched KEGG pathways
The enrichplot package implements several methods to visualize enriched terms. Most of them are general methods that can be used on GO, KEGG, MSigDb, and other gene set annotations. Here, we introduce the clusterProfiler::browseKEGG() and pathview::pathview() functions to help users explore enriched KEGG pathways with genes of interest.
To view the KEGG pathway, users can use the browseKEGG function, which will open a web browser and highlight enriched genes.
(ref:browseKEGGscap) Explore selected KEGG pathway.
(ref:browseKEGGcap) Explore selected KEGG pathway. Differentially expressed genes that are enriched in the selected pathway will be highlighted.
Users can also use the pathview() function from the pathview (Luo and Brouwer 2013 ) to visualize enriched KEGG pathways identified by the clusterProfiler package (Yu et al. 2012 ) .
The following example illustrates how to visualize the “hsa04110” pathway, which was enriched in our previous analysis.
library ( "pathview" )
hsa04110 <- pathview ( gene.data = geneList ,
pathway.id = "hsa04110" ,
species = "hsa" ,
limit = list ( gene= max ( abs ( geneList ) ) , cpd= 1 ) )
(ref:pathviewscap) Visualze selected KEGG pathway by pathview().
(ref:pathviewcap) Visualize selected KEGG pathway by pathview(). Gene expression values can be mapped to gradient color scale.
References
Luo, Weijun, and Cory Brouwer. 2013.
“Pathview: An R/Bioconductor Package for Pathway-Based Data Integration and Visualization.” Bioinformatics 29 (July): 1830–31.
https://doi.org/10.1093/bioinformatics/btt285 .
Yu, Guangchuang, Le-Gen Wang, Yanyan Han, and Qing-Yu He. 2012.
“clusterProfiler: An r Package for Comparing Biological Themes Among Gene Clusters.” OMICS: A Journal of Integrative Biology 16 (5): 284–87.
https://doi.org/10.1089/omi.2011.0118 .