Miscellaneous topics

Leading Edge Analysis

Leading edge analysis is a powerful feature in GSEA that identifies the core set of genes driving the enrichment signal. It reports three key metrics:

  • Tags: Percentage of genes contributing to the enrichment score
  • List: Position in the ranked list where the enrichment score is attained
  • Signal: Strength of the enrichment signal

DOSE, clusterProfiler, and ReactomePA all support leading edge analysis and can report the core enriched genes that contribute to the enrichment.

Core Enriched Genes Extraction

After performing GSEA, the results object contains a core_enrichment column that lists the core genes responsible for each enriched term:

library(DOSE)
DOSE v4.7.0 Learn more at https://yulab-smu.top/contribution-knowledge-mining/

Please cite:

Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
R/Bioconductor package for Disease Ontology Semantic and Enrichment
analysis. Bioinformatics. 2015, 31(4):608-609
data(geneList)
x <- gseDO(geneList)
head(x)
                       ID                      Description setSize
DOID:0111962 DOID:0111962        combined immunodeficiency      61
DOID:0060306 DOID:0060306            Meier-Gorlin syndrome      10
DOID:2957       DOID:2957           pulmonary tuberculosis      78
DOID:2799       DOID:2799         bronchiolitis obliterans      25
DOID:399         DOID:399                     tuberculosis     102
DOID:612         DOID:612 primary immunodeficiency disease     236
             enrichmentScore      NES       pvalue     p.adjust       qvalue
DOID:0111962       0.6365098 2.372297 1.910010e-08 9.504607e-06 4.579058e-06
DOID:0060306       0.9453694 2.213949 2.647523e-08 9.504607e-06 4.579058e-06
DOID:2957          0.5677168 2.196492 3.319363e-07 7.149909e-05 3.444630e-05
DOID:2799          0.7055858 2.148235 2.092033e-05 3.115795e-03 1.501104e-03
DOID:399           0.5295891 2.117684 1.389251e-07 3.740557e-05 1.802098e-05
DOID:612           0.4516799 2.087149 4.460047e-10 4.803470e-07 2.314180e-07
             rank                   leading_edge
DOID:0111962 2039 tags=54%, list=16%, signal=45%
DOID:0060306  452  tags=80%, list=4%, signal=77%
DOID:2957    1864 tags=47%, list=15%, signal=41%
DOID:2799    1061  tags=44%, list=8%, signal=40%
DOID:399     1864 tags=43%, list=15%, signal=37%
DOID:612     2521 tags=43%, list=20%, signal=35%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            core_enrichment
DOID:0111962                                                                                                                                                                                                                                                                                                                                                         9837/1503/7037/3932/3559/51311/3561/3574/3575/4860/915/959/11151/50615/1794/3689/5788/5424/5695/3394/10525/100/5880/5699/204/10095/5971/10125/8456/8625/3071/7293/4478
DOID:0060306                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     8318/81620/4174/990/23594/4998/64785/51053
DOID:2957                                                                                                                                                                                                                                                                                                                                            8685/3627/3620/3576/6890/6352/54210/80380/51311/6347/9332/3654/1535/3458/959/1594/942/9235/1234/467/7096/100/3600/6891/929/26191/4282/7124/912/5329/3569/4049/7097/5133/1118/3119/1956
DOID:2799                                                                                                                                                                                                                                                                                                                                                                                                                                                                             3627/6373/4283/3002/4318/6352/6347/6354/942/6361/6367
DOID:399                                                                                                                                                                                                                                                                                                         8685/3627/4283/3620/4318/3576/6890/6352/54210/80380/6772/51311/6347/9332/3654/1535/3458/959/1594/942/1557/9235/1234/467/7096/6367/100/3600/6696/6891/929/26191/4282/7124/912/5329/3569/4049/7097/56244/5133/1118/3119/1956
DOID:612     55388/7153/9837/29851/9636/1503/1493/7037/4173/3932/3559/6772/51311/3507/3561/917/3654/3574/3575/919/4860/915/22806/5693/4938/1535/3458/959/5336/11151/3702/925/4688/64135/28755/50615/974/1794/3689/5788/5424/916/7096/4068/3937/30009/5695/3394/10525/100/7374/3659/940/939/4689/5880/7128/6891/4210/6789/5699/930/6573/11322/204/6850/10095/7124/3569/7097/7852/8772/5692/64170/3119/1956/28985/1053/5971/1536/10125/8456/8625/3071/7293/4478/1380/958/5054/5591/9437/10379/54440/3570/3978/3593/10625/29927/3558/51371/735
               log2err
DOID:0111962 0.7337620
DOID:0060306 0.7337620
DOID:2957    0.6749629
DOID:2799    0.5756103
DOID:399     0.6901325
DOID:612     0.8140358

The output includes the core enriched genes in Entrez ID format for each significant term.

Enhancing Readability with setReadable

To make the results more interpretable, use setReadable() to convert Entrez IDs to gene symbols:

library(clusterProfiler)
clusterProfiler v4.21.0.004 Learn more at https://yulab-smu.top/contribution-knowledge-mining/

Please cite:

T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan,
X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal
enrichment tool for interpreting omics data. The Innovation. 2021,
2(3):100141

Attaching package: 'clusterProfiler'
The following object is masked from 'package:stats':

    filter
y <- setReadable(x, 'org.Hs.eg.db')
head(y, 2)
                       ID               Description setSize enrichmentScore
DOID:0111962 DOID:0111962 combined immunodeficiency      61       0.6365098
DOID:0060306 DOID:0060306     Meier-Gorlin syndrome      10       0.9453694
                  NES       pvalue     p.adjust       qvalue rank
DOID:0111962 2.372297 1.910010e-08 9.504607e-06 4.579058e-06 2039
DOID:0060306 2.213949 2.647523e-08 9.504607e-06 4.579058e-06  452
                               leading_edge
DOID:0111962 tags=54%, list=16%, signal=45%
DOID:0060306  tags=80%, list=4%, signal=77%
                                                                                                                                                                                           core_enrichment
DOID:0111962 GINS1/CTPS1/TFRC/LCK/IL2RA/TLR8/IL2RG/IL7/IL7R/PNP/CD3D/CD40LG/CORO1A/IL21R/DOCK2/ITGB2/PTPRC/POLD1/PSMB7/IRF8/HYOU1/ADA/RAC2/PSMB10/AK2/ARPC1B/RELB/RASGRP1/FOXN1/RFXANK/NCKAP1L/TNFRSF4/MSN
DOID:0060306                                                                                                                                                     CDC45/CDT1/MCM5/CDC6/ORC6/ORC1/GINS3/GMNN
              log2err
DOID:0111962 0.733762
DOID:0060306 0.733762

This transformation makes the core enrichment results much more readable and biologically meaningful.

For visualization of leading edge analysis results using cnetplot, please refer to the enrichplot chapter.

Non-Model Plant Annotation with clusterProfiler

For non-model plants and other organisms lacking standard annotation packages, clusterProfiler can be used with custom annotation data obtained from tools like eggNOG.

Workflow Overview

  1. Annotation with eggNOG: Use the eggNOG web server to annotate protein sequences
  2. Parse eggNOG Results: Extract GO and KEGG annotations using custom scripts
  3. Enrichment Analysis: Use clusterProfiler’s enricher() function with custom annotation data

Key Steps

1. eggNOG Annotation

Upload protein sequences to the eggNOG mapper with appropriate parameters for your organism.

2. Parsing eggNOG Results

Use Python scripts to process eggNOG output files:

# Parse GO ontology file
python parse_go_obofile.py -i go-basic.obo -o go.tb

# Parse eggNOG annotations with reference species filtering
python parse_eggNOG.py -i panax_ginseng.annotations \
                       -g go.tb \
                       -O ath,osa \
                       -o output_directory

This generates two key files: - GOannotation.tsv: GO term annotations - KOannotation.tsv: KEGG pathway annotations

3. Enrichment Analysis with clusterProfiler

library(clusterProfiler)

# Read annotation files
KOannotation <- read.delim("KOannotation.tsv", stringsAsFactors=FALSE)
GOannotation <- read.delim("GOannotation.tsv", stringsAsFactors=FALSE)
GOinfo <- read.delim("go.tb", stringsAsFactors=FALSE)

# Your gene list
gene_list <- c("gene1", "gene2", "gene3")  # Replace with your actual gene list

# GO enrichment (Molecular Function as example)
GOannotation_split <- split(GOannotation, GOannotation$level)
enricher(gene_list,
          TERM2GENE = GOannotation_split[['molecular_function']][c(2,1)],
          TERM2NAME = GOinfo[1:2])

# KEGG enrichment
enricher(gene_list,
          TERM2GENE = KOannotation[c(3,1)],
          TERM2NAME = KOannotation[c(3,4)])

Advantages

  • Works for any organism with protein sequences
  • Uses reliable eggNOG annotation pipeline
  • Flexible reference species filtering for KEGG

Considerations

  • Requires intermediate Python scripting
  • Performance may vary with dataset size
  • Manual integration of annotation and analysis steps

This approach enables comprehensive functional enrichment analysis for non-model organisms using clusterProfiler’s powerful enrichment capabilities combined with custom annotation data.