5  KEGG enrichment analysis

The KEGG FTP service has not been freely available for academic use since 2012, and there are many software packages using outdated KEGG annotation data. The clusterProfiler package supports downloading the latest online version of KEGG data using the KEGG website, which is freely available for academic users. Both KEGG pathways and modules are supported in clusterProfiler.

5.1 Supported organisms

The clusterProfiler package supports all organisms that have KEGG annotation data available in the KEGG database. Users should pass an abbreviation of academic name to the organism parameter. The full list of KEGG supported organisms can be accessed via http://www.genome.jp/kegg/catalog/org_list.html. KEGG Orthology (KO) Database is also supported by specifying organism = "ko".

The clusterProfiler package provides the search_kegg_organism() function to help search for supported organisms.

library(clusterProfiler)
search_kegg_organism('ece', by='kegg_code')
     kegg_code                 scientific_name          common_name
1299       ece Escherichia coli O157:H7 EDL933                 EHEC
8339      ecec            Enterococcus cecorum Enterococcus cecorum
ecoli <- search_kegg_organism('Escherichia coli', by='scientific_name')
dim(ecoli)
[1] 66  3
head(ecoli)
     kegg_code              scientific_name                  common_name
1293       eco Escherichia coli K-12 MG1655 Escherichia coli K-12 MG1655
1294       ecj  Escherichia coli K-12 W3110  Escherichia coli K-12 W3110
1295       ecd  Escherichia coli K-12 DH10B  Escherichia coli K-12 DH10B
1296       ebw Escherichia coli K-12 BW2952 Escherichia coli K-12 BW2952
1297      ecok  Escherichia coli K-12 MDS42  Escherichia coli K-12 MDS42
1298      ecoc  Escherichia coli K-12 C3026  Escherichia coli K-12 C3026

5.2 Finding correct parameters: A case study with Rice

The enrichKEGG() function requires three key parameters: gene (gene IDs), organism (species code), and keyType (type of gene ID). Many users struggle with determining the correct values for organism and keyType, especially for non-model organisms. Here, we use rice (Oryza sativa) as an example to demonstrate how to find these parameters.

5.2.1 Determine the organism code

The organism parameter accepts a KEGG organism code (e.g., ‘hsa’ for human, ‘mmu’ for mouse). To find the code for your species, you can search the KEGG Organism Catalog or use the search_kegg_organism() function as mentioned above.

For rice, searching for “rice” or “Oryza sativa” in the catalog reveals two main subspecies: * Oryza sativa japonica (Japanese rice) -> code: osa * Oryza sativa indica (Indian rice) -> code: dosa

Choose the one that matches your data. For example, if we choose dosa.

5.2.2 Determine the keyType

Once the organism is determined, we need to know what gene ID types (keyType) are supported for that organism. A practical trick is to use enrichKEGG() as a query tool. By inputting a dummy gene ID and a likely keyType, the function will return an error message listing the expected ID format if the input is invalid.

# Try 'kegg' as keyType with a dummy gene
enrichKEGG(gene = "abc", organism = "dosa", keyType = "kegg")
# Output:
# --> No gene can be mapped....
# --> Expected input gene ID: Os06t0664200-01,Os02t0534400-01,Os06t0185100-00...
# --> return NULL...

# Try 'ncbi-geneid'
enrichKEGG(gene = "abc", organism = "dosa", keyType = "ncbi-geneid")
# Output:
# Error in KEGG_convert("kegg", keyType, species) :
#   ncbi-geneid is not supported for dosa ...

# Try 'uniprot'
enrichKEGG(gene = "abc", organism = "dosa", keyType = "uniprot")
# Output:
# Error in KEGG_convert("kegg", keyType, species) :
#   uniprot is not supported for dosa ...

From the output, we learn that for dosa, the supported keyType is kegg, and the expected gene ID format looks like Os06t0664200-01 (RAP-ID). For osa, the gene IDs are typically NCBI Gene IDs (e.g., 3131385).

5.2.3 ID Conversion

If your gene list uses a different ID type (e.g., LOC4334374 or LOC_Os01g01010.1), you need to convert them to the supported KEGG ID.

For LOC4334374, you can use AnnotationHub to retrieve the rice OrgDb and convert symbols to Entrez IDs (which correspond to osa KEGG IDs).

library(AnnotationHub)
ah <- AnnotationHub()
# Query for Oryza sativa
query(ah, 'oryza')
oryza <- ah[['AH55775']] # Example AH ID, check for latest
# Check keys
head(keys(oryza, keytype = "SYMBOL"))
# Convert SYMBOL to ENTREZID
# ...

For RAP-ID conversion (e.g., LOC_Os01g01010.1 to Os01t0100100-01), you might need to download mapping files from the Rice Annotation Project Database (RAP-DB) or Oryzabase and perform the conversion manually or using scripts.

5.2.4 Perform Enrichment Analysis

With the correct organism, keyType, and converted gene IDs, you can now run the analysis:

# For osa (using Entrez IDs)
kk <- enrichKEGG(gene         = gene_list_entrez,
                 organism     = "osa",
                 keyType      = "kegg", # or 'ncbi-geneid' if supported
                 pvalueCutoff = 0.05)

# For dosa (using RAP-IDs)
kk <- enrichKEGG(gene         = gene_list_rap,
                 organism     = "dosa",
                 keyType      = "kegg",
                 pvalueCutoff = 0.05)

This approach of identifying organism code and verifying keyType via error messages applies to any species available in the KEGG database.

5.3 KEGG Data Localization

The clusterProfiler package retrieves KEGG data online via the KEGG API, which ensures that users analyze their data with the latest knowledge. However, this dependency on internet access can be a limitation if the network is unstable or unavailable (e.g., in some cloud environments). Furthermore, for the sake of reproducibility, it is often desirable to freeze the KEGG data version used in an analysis.

To address these issues, we provide two methods to perform KEGG analysis locally.

5.3.1 Using gson

The gson package provides the gson_KEGG() function to download the latest KEGG pathway and module data for a specific organism and store it in a GSON object. This object contains all necessary information for enrichment analysis and can be saved to a file for offline use.

library(gson)
# download KEGG data for human
kk_gson <- gson_KEGG(species = "hsa")

# save to a file
write.gson(kk_gson, file = "kegg_hsa.gson")

# read from file
kk_gson <- read.gson("kegg_hsa.gson")

The GSON object or the file can be used in enricher() and GSEA() functions (via gson parameter or by parsing it). For more detailed information on using the GSON format, including how to perform generalized enrichment analysis with multiple knowledge bases using gsonList, please refer to Section 6.3.

5.3.2 Using createKEGGdb

Another approach is to create a local KEGG.db package containing the KEGG data. While the official KEGG.db has not been updated since 2011, we can generate a new one using the createKEGGdb package.

First, install createKEGGdb from GitHub:

remotes::install_github("YuLab-SMU/createKEGGdb")

Then, use create_kegg_db() to download data and package it. You can specify one or more species, or even “all” to download data for all supported species.

library(createKEGGdb)
# create KEGG.db for human and Arabidopsis
create_kegg_db(c("hsa", "ath"))

This command will generate a KEGG.db_1.0.tar.gz file in the current directory. Install it as a standard R package:

install.packages("KEGG.db_1.0.tar.gz", repos=NULL, type="source")

Once installed, you can perform enrichment analysis using the local data by setting use_internal_data = TRUE in enrichKEGG():

library(KEGG.db)
kk <- enrichKEGG(gene         = gene,
                 organism     = 'hsa',
                 pvalueCutoff = 0.05,
                 use_internal_data = TRUE)

This ensures that the analysis runs offline and is fully reproducible using the installed KEGG.db version.

5.4 KEGG ID Conversion

The clusterProfiler package provides the bitr_kegg() function to support ID conversion via the KEGG API. This is particularly useful for species that do not have an OrgDb object but are supported by KEGG.

5.4.1 Basic Usage

Here is an example of converting KEGG IDs to NCBI Protein IDs and then to UniProt IDs for human genes.

library(clusterProfiler)
data(gcSample)
hg <- gcSample[[1]]
head(hg)
[1] "4597"  "7111"  "5266"  "2175"  "755"   "23046"
# Convert KEGG ID to NCBI Protein ID
eg2np <- bitr_kegg(hg, fromType='kegg', toType='ncbi-proteinid', organism='hsa')
head(eg2np)
   kegg ncbi-proteinid
1 10001      NP_005457
2 10209      NP_005792
3 10232      NP_037536
4 10324      NP_006054
5 10411   NP_001092001
6 10614      NP_006451
# Convert NCBI Protein ID to UniProt ID
np2up <- bitr_kegg(eg2np[,2], fromType='ncbi-proteinid', toType='uniprot', organism='hsa')
head(np2up)
  ncbi-proteinid uniprot
1      NP_005457  O75586
2      NP_005792  P41567
3      NP_005792  Q6IAV3
4      NP_037536  Q13421
5      NP_006054  O60662
6   NP_001092001  O95398

The ID type (both fromType and toType) should be one of ‘kegg’, ‘ncbi-geneid’, ‘ncbi-proteinid’, or ‘uniprot’. The ‘kegg’ ID is the primary ID used in the KEGG database. A rule of thumb is that ‘kegg’ ID corresponds to entrezgene ID for eukaryote species and Locus ID for prokaryotes.

For many prokaryote species, entrezgene IDs are not available. For example, ece:Z5100 (E. coli O157:H7 EDL933) has NCBI-ProteinID and UniProt links, but not NCBI-GeneID. Attempting to convert Z5100 to ncbi-geneid will result in an error stating that ncbi-geneid is not supported. However, conversion to ncbi-proteinid or uniprot is possible.

# Example for prokaryote ID conversion
bitr_kegg("Z5100", fromType="kegg", toType='ncbi-proteinid', organism='ece')
   kegg ncbi-proteinid
1 Z5100       AAG58814
bitr_kegg("Z5100", fromType="kegg", toType='uniprot', organism='ece')
   kegg    uniprot
1 Z5100 A0A9Q7EE73

5.4.2 Converting KO numbers to Pathways

A common confusion exists between K numbers (KEGG Orthology) and ko numbers (KEGG Pathway maps). K numbers represent orthologous groups, while ko numbers represent pathway maps. Users often want to map K numbers to pathways.

bitr_kegg("K00844", "kegg", "Path", "ko")
     kegg    Path
1  K00844 ko00010
2  K00844 ko00051
3  K00844 ko00052
4  K00844 ko00500
5  K00844 ko00520
6  K00844 ko00521
7  K00844 ko00524
8  K00844 ko01100
9  K00844 ko01110
10 K00844 ko01120
11 K00844 ko01200
12 K00844 ko01250
13 K00844 ko04066
14 K00844 ko04910
15 K00844 ko04930
16 K00844 ko04973
17 K00844 ko05131
18 K00844 ko05230

To retrieve the names of the pathways (e.g., “Glutathione metabolism” instead of “ko00480”), we can define a simple helper function ko2name to query the KEGG API.

x <- bitr_kegg("K00799", "kegg", "Path", "ko")
y <- ko2name(x$Path)
merge(x, y, by.x='Path', by.y='ko')
      Path   kegg                                              name
1  ko00480 K00799                            Glutathione metabolism
2  ko00980 K00799      Metabolism of xenobiotics by cytochrome P450
3  ko00982 K00799                 Drug metabolism - cytochrome P450
4  ko00983 K00799                   Drug metabolism - other enzymes
5  ko01100 K00799                                Metabolic pathways
6  ko01524 K00799                          Platinum drug resistance
7  ko04212 K00799               Longevity regulating pathway - worm
8  ko05200 K00799                                Pathways in cancer
9  ko05204 K00799             Chemical carcinogenesis - DNA adducts
10 ko05207 K00799     Chemical carcinogenesis - receptor activation
11 ko05208 K00799 Chemical carcinogenesis - reactive oxygen species
12 ko05225 K00799                          Hepatocellular carcinoma
13 ko05418 K00799            Fluid shear stress and atherosclerosis

5.5 KEGG pathway over-representation analysis

data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]

kk <- enrichKEGG(gene         = gene,
                 organism     = 'hsa',
                 pvalueCutoff = 0.05)
head(kk)
                                     category
hsa04110                   Cellular Processes
hsa04114                   Cellular Processes
hsa04218                   Cellular Processes
hsa04061 Environmental Information Processing
hsa03320                   Organismal Systems
hsa04814                   Cellular Processes
                                 subcategory       ID
hsa04110               Cell growth and death hsa04110
hsa04114               Cell growth and death hsa04114
hsa04218               Cell growth and death hsa04218
hsa04061 Signaling molecules and interaction hsa04061
hsa03320                    Endocrine system hsa03320
hsa04814                       Cell motility hsa04814
                                                           Description
hsa04110                                                    Cell cycle
hsa04114                                                Oocyte meiosis
hsa04218                                           Cellular senescence
hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320                                        PPAR signaling pathway
hsa04814                                                Motor proteins
         GeneRatio  BgRatio RichFactor FoldEnrichment   zScore       pvalue
hsa04110    15/110 158/9370 0.09493671       8.086881 9.791387 3.840760e-10
hsa04114    10/110 138/9370 0.07246377       6.172596 6.671717 4.642469e-06
hsa04218    10/110 157/9370 0.06369427       5.425594 6.094783 1.459721e-05
hsa04061     8/110 100/9370 0.08000000       6.814545 6.371085 2.122549e-05
hsa03320     7/110  76/9370 0.09210526       7.845694 6.530710 2.847466e-05
hsa04814    10/110 194/9370 0.05154639       4.390815 5.201346 8.933373e-05
             p.adjust       qvalue
hsa04110 1.351947e-07 1.351947e-07
hsa04114 8.170745e-04 8.170745e-04
hsa04218 1.712740e-03 1.712740e-03
hsa04061 1.867844e-03 1.867844e-03
hsa03320 2.004616e-03 2.004616e-03
hsa04814 5.240912e-03 5.240912e-03
                                                                           geneID
hsa04110 9133/4174/891/7272/81620/10403/1111/9319/983/9212/8318/890/9232/4085/991
hsa04114                          9133/5241/891/6790/51806/983/3708/9232/4085/991
hsa04218                           9133/891/1111/2305/776/51806/983/3708/890/4605
hsa04061                                 6355/3627/1524/10563/6362/6373/9547/4283
hsa03320                                       5346/5105/9415/3158/9370/4312/2167
hsa04814                   81930/146909/3833/4629/10112/24137/9493/3832/1062/7802
         Count
hsa04110    15
hsa04114    10
hsa04218    10
hsa04061     8
hsa03320     7
hsa04814    10

Input ID type can be kegg, ncbi-geneid, ncbi-proteinid or uniprot (see also ?sec-kegg-id-conversion). Unlike enrichGO(), there is no readable parameter for enrichKEGG(). However, users can use the setReadable() function (see also Section 17.2) if there is an OrgDb available for the species.

5.6 Translating Gene IDs to Names

For GO analysis, the readable parameter controls whether to translate the IDs to human-readable gene names. This parameter is not available for KEGG analysis. However, the setReadable() function can translate input gene IDs to gene names if the corresponding OrgDb object is available.

library(org.Hs.eg.db)
# 'kk' is the enrichKEGG result from the previous section
kk <- setReadable(kk, OrgDb = org.Hs.eg.db, keyType="ENTREZID")
head(kk)
                                     category
hsa04110                   Cellular Processes
hsa04114                   Cellular Processes
hsa04218                   Cellular Processes
hsa04061 Environmental Information Processing
hsa03320                   Organismal Systems
hsa04814                   Cellular Processes
                                 subcategory       ID
hsa04110               Cell growth and death hsa04110
hsa04114               Cell growth and death hsa04114
hsa04218               Cell growth and death hsa04218
hsa04061 Signaling molecules and interaction hsa04061
hsa03320                    Endocrine system hsa03320
hsa04814                       Cell motility hsa04814
                                                           Description
hsa04110                                                    Cell cycle
hsa04114                                                Oocyte meiosis
hsa04218                                           Cellular senescence
hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320                                        PPAR signaling pathway
hsa04814                                                Motor proteins
         GeneRatio  BgRatio RichFactor FoldEnrichment   zScore       pvalue
hsa04110    15/110 158/9370 0.09493671       8.086881 9.791387 3.840760e-10
hsa04114    10/110 138/9370 0.07246377       6.172596 6.671717 4.642469e-06
hsa04218    10/110 157/9370 0.06369427       5.425594 6.094783 1.459721e-05
hsa04061     8/110 100/9370 0.08000000       6.814545 6.371085 2.122549e-05
hsa03320     7/110  76/9370 0.09210526       7.845694 6.530710 2.847466e-05
hsa04814    10/110 194/9370 0.05154639       4.390815 5.201346 8.933373e-05
             p.adjust       qvalue
hsa04110 1.351947e-07 1.351947e-07
hsa04114 8.170745e-04 8.170745e-04
hsa04218 1.712740e-03 1.712740e-03
hsa04061 1.867844e-03 1.867844e-03
hsa03320 2.004616e-03 2.004616e-03
hsa04814 5.240912e-03 5.240912e-03
                                                                                         geneID
hsa04110 CCNB2/MCM5/CCNB1/TTK/CDT1/NDC80/CHEK1/TRIP13/CDK1/AURKB/CDC45/CCNA2/PTTG1/MAD2L1/CDC20
hsa04114                             CCNB2/PGR/CCNB1/AURKA/CALML5/CDK1/ITPR1/PTTG1/MAD2L1/CDC20
hsa04218                          CCNB2/CCNB1/CHEK1/FOXM1/CACNA1D/CALML5/CDK1/ITPR1/CCNA2/MYBL2
hsa04061                                    CCL8/CXCL10/CX3CR1/CXCL13/CCL18/CXCL11/CXCL14/CXCL9
hsa03320                                              PLIN1/PCK1/FADS2/HMGCS2/ADIPOQ/MMP1/FABP4
hsa04814                        KIF18A/KIF18B/KIFC1/MYH11/KIF20A/KIF4A/KIF23/KIF11/CENPE/DNALI1
         Count
hsa04110    15
hsa04114    10
hsa04218    10
hsa04061     8
hsa03320     7
hsa04814    10

5.7 KEGG pathway gene set enrichment analysis

kk2 <- gseKEGG(geneList     = geneList,
               organism     = 'hsa',
               minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
head(kk2)
               ID                             Description setSize
hsa04110 hsa04110                              Cell cycle     139
hsa03008 hsa03008       Ribosome biogenesis in eukaryotes      55
hsa05169 hsa05169            Epstein-Barr virus infection     195
hsa04613 hsa04613 Neutrophil extracellular trap formation     130
hsa04218 hsa04218                     Cellular senescence     141
hsa04114 hsa04114                          Oocyte meiosis      96
         enrichmentScore      NES       pvalue     p.adjust       qvalue rank
hsa04110       0.6637551 2.790963 3.219791e-12 3.606166e-10 3.606166e-10 1155
hsa03008       0.5737068 2.045389 1.830568e-05 4.100473e-04 4.100473e-04 4001
hsa05169       0.4335116 1.951224 1.072378e-07 6.005315e-06 6.005315e-06 2820
hsa04613       0.4496570 1.874746 1.118494e-05 4.100473e-04 4.100473e-04 2575
hsa04218       0.4115945 1.771015 1.164361e-04 1.304084e-03 1.304084e-03 1155
hsa04114       0.4396853 1.756120 2.973954e-04 2.562176e-03 2.562176e-03  896
                           leading_edge
hsa04110  tags=36%, list=9%, signal=33%
hsa03008 tags=75%, list=32%, signal=51%
hsa05169 tags=39%, list=23%, signal=31%
hsa04613 tags=37%, list=21%, signal=30%
hsa04218  tags=17%, list=9%, signal=16%
hsa04114  tags=20%, list=7%, signal=19%
                                                                                                                                                                                                                                                                                                                                                                                     core_enrichment
hsa04110                                                                                                                                        8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/10926/6502/994/699/4609/5111/26271/1869/1029/8317/4176/2810/3066/1871/1031/9088/995/1019/4172/5885/11200/7027/1875
hsa03008                                                                                                                                                  23560/54913/10799/55131/27341/5822/10436/10248/29107/3692/54433/29889/81691/5901/10775/10557/55505/1460/23160/6949/10940/4809/2091/10556/51096/55651/92856/55813/7514/55226/56000/25996/10885/55127/4931/51068/1459/10171/55916/55341/9790
hsa05169 3627/890/6890/9636/898/9134/6502/6772/3126/3112/4609/917/5709/1869/3654/919/915/4067/4938/864/4940/5713/5336/11047/3066/54205/1871/578/1019/637/916/3383/4939/10213/23586/4793/5603/7979/7128/6891/930/5714/3452/6850/5702/4794/7124/3569/7097/5708/2208/8772/3119/5704/7186/5971/3135/1380/958/5610/55080/4792/10018/8819/3134/10379/9641/1147/5718/6300/3109/811/5606/2923/3108/5707/1432
hsa04613                                                                                                                                            820/366/51311/64581/3015/85236/55506/8970/8357/1535/2359/5336/4688/92815/3066/8336/292/1991/3689/8345/5603/4689/5880/10105/1184/6404/3018/6850/5604/3014/7097/1378/8290/1536/834/5605/1183/728/2215/8335/5594/9734/3674/5578/5582/7417/8331/6300
hsa04218                                                                                                                                                                                                                                                                       2305/4605/9133/890/983/51806/1111/891/993/3576/1978/898/9134/4609/1869/1029/22808/1871/5499/91860/292/1019/11200/1875
hsa04114                                                                                                                                                                                                                                                                                                  991/9133/983/4085/51806/6790/891/9232/5347/9700/898/9134/699/26271/114/5499/91860/9088/995

5.8 KEGG Pathway Classification

KEGG pathways are organized into a hierarchical structure with top-level categories such as “Metabolism”, “Genetic Information Processing”, “Environmental Information Processing”, “Cellular Processes”, “Organismal Systems”, “Human Diseases”, and “Drug Development”. This classification helps users interpret enrichment results by providing higher-level biological context. For instance, users might be interested specifically in signaling pathways (“Environmental Information Processing”) or metabolic pathways (“Metabolism”) while excluding disease-related pathways.

The clusterProfiler package integrates this classification information directly into the enrichment results. The output data.frame contains category and subcategory columns, which can be used to filter or group the results.

5.8.1 Data Update Strategy

While clusterProfiler queries the KEGG API for species-specific gene sets to ensure the latest data is used, the pathway classification structure is relatively stable and universal across species. To optimize performance and avoid unnecessary dependencies (e.g., web scraping packages), clusterProfiler caches this classification data within the package.

To prevent this cached data from becoming outdated, the package utilizes GitHub Actions for automated maintenance. A workflow is triggered weekly to crawl the latest pathway classification from the KEGG website. If any discrepancies are found between the online data and the cached data, the workflow automatically creates a Pull Request to update the package. This automation ensures that the classification data in clusterProfiler remains synchronized with KEGG updates efficiently and reliably.

5.9 KEGG module over-representation analysis

KEGG Module is a collection of manually defined functional units. In some situations, KEGG Modules have a more straightforward interpretation.

mkk <- enrichMKEGG(gene = gene,
                   organism = 'hsa',
                   pvalueCutoff = 1,
                   qvalueCutoff = 1)
head(mkk)                   
           ID
M00912 M00912
M00095 M00095
M00053 M00053
M00938 M00938
M00003 M00003
M00049 M00049
                                                                    Description
M00912                       NAD biosynthesis, tryptophan => quinolinate => NAD
M00095                           C5 isoprenoid biosynthesis, mevalonate pathway
M00053 Deoxyribonucleotide biosynthesis, ADP/GDP/CDP/UDP => dATP/dGTP/dCTP/dUTP
M00938                 Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP
M00003                             Gluconeogenesis, oxaloacetate => fructose-6P
M00049                      Adenine ribonucleotide biosynthesis, IMP => ADP,ATP
       GeneRatio BgRatio RichFactor FoldEnrichment   zScore      pvalue
M00912       2/9  12/834 0.16666667      15.444444 5.261049 0.006465714
M00095       1/9  10/834 0.10000000       9.266667 2.745261 0.103353256
M00053       1/9  11/834 0.09090909       8.424242 2.587411 0.113146727
M00938       1/9  14/834 0.07142857       6.619048 2.213281 0.141959193
M00003       1/9  18/834 0.05555556       5.148148 1.857215 0.179080477
M00049       1/9  19/834 0.05263158       4.877193 1.784564 0.188134737
        p.adjust    qvalue     geneID Count
M00912 0.2844914 0.2844914 3620/23475     2
M00095 1.0000000 1.0000000       3158     1
M00053 1.0000000 1.0000000       6241     1
M00938 1.0000000 1.0000000       6241     1
M00003 1.0000000 1.0000000       5105     1
M00049 1.0000000 1.0000000      26289     1

5.10 KEGG module gene set enrichment analysis

mkk2 <- gseMKEGG(geneList = geneList,
                 organism = 'hsa',
                 pvalueCutoff = 1)
head(mkk2)
           ID                                               Description setSize
M00912 M00912        NAD biosynthesis, tryptophan => quinolinate => NAD       8
M00004 M00004       Pentose phosphate pathway (Pentose phosphate cycle)       9
M00001 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate      24
M00002 M00002  Glycolysis, core module involving three-carbon compounds      11
M00099 M00099                                  Sphingosine biosynthesis       9
M00938 M00938  Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP      10
       enrichmentScore       NES       pvalue   p.adjust     qvalue rank
M00912       0.8509100  1.860804 0.0006267109 0.02757528 0.01509224  679
M00004       0.7450305  1.740560 0.0101970906 0.14955733 0.08185432 1750
M00001       0.5739035  1.690401 0.0064561050 0.14203431 0.07773689 2886
M00002       0.6421781  1.609823 0.0430250206 0.27044299 0.14801632 1381
M00099      -0.7084736 -1.578506 0.0165359849 0.18189583 0.09955352 2125
M00938       0.6648002  1.541740 0.0339348964 0.27044299 0.14801632  648
                         leading_edge
M00912  tags=62%, list=5%, signal=59%
M00004 tags=67%, list=14%, signal=57%
M00001 tags=54%, list=23%, signal=42%
M00002 tags=55%, list=11%, signal=49%
M00099 tags=56%, list=17%, signal=46%
M00938  tags=40%, list=5%, signal=38%
                                                        core_enrichment
M00912                                        23475/3620/6999/8564/8942
M00004                                   2821/5226/7086/2539/6888/22934
M00001 5214/3101/2821/7167/2597/5230/2023/5223/5315/3099/5232/2027/5211
M00002                                    7167/2597/5230/2023/5223/5315
M00099                                     253782/29956/427/55304/79603
M00938                                              6241/7298/4830/1841

5.11 Visualize enriched KEGG pathways

The enrichplot package implements several methods to visualize enriched terms. Most of them are general methods that can be used on GO, KEGG, MSigDb, and other gene set annotations. Here, we introduce the clusterProfiler::browseKEGG() and pathview::pathview() functions to help users explore enriched KEGG pathways with genes of interest.

To view the KEGG pathway, users can use the browseKEGG() function, which will open a web browser and highlight enriched genes. See Figure 5.1 for a screenshot example.

browseKEGG(kk, 'hsa04110')
Figure 5.1: Explore selected KEGG pathway. Differentially expressed genes that are enriched in the selected pathway will be highlighted.

Users can also use the pathview() function from the pathview (Luo and Brouwer 2013) to visualize enriched KEGG pathways identified by the clusterProfiler package (Yu et al. 2012). See Figure 5.2 for a rendered pathway image.

The following example illustrates how to visualize the “hsa04110” pathway, which was enriched in our previous analysis.

library("pathview")
hsa04110 <- pathview(gene.data  = geneList,
                     pathway.id = "hsa04110",
                     species    = "hsa",
                     limit      = list(gene=max(abs(geneList)), cpd=1))
Figure 5.2: Visualize selected KEGG pathway by pathview(). Gene expression values can be mapped to gradient color scale.

References

Luo, Weijun, and Cory Brouwer. 2013. “Pathview: An R/Bioconductor Package for Pathway-Based Data Integration and Visualization.” Bioinformatics 29 (July): 1830–31. https://doi.org/10.1093/bioinformatics/btt285.
Yu, Guangchuang, Le-Gen Wang, Yanyan Han, and Qing-Yu He. 2012. “clusterProfiler: An r Package for Comparing Biological Themes Among Gene Clusters.” OMICS: A Journal of Integrative Biology 16 (5): 284–87. https://doi.org/10.1089/omi.2011.0118.