Enrichment Analysis
Terminology
Gene sets and pathway
A gene set is an unordered collection of genes that are functionally related. A pathway can be interpreted as a gene set by ignoring functional relationships among genes.
Gene Ontology (GO)
Gene Ontology defines concepts/classes used to describe gene function, and relationships between these concepts. It classifies functions along three aspects:
- MF: Molecular Function
- molecular activities of gene products
- CC: Cellular Component
- where gene products are active
- BP: Biological Process
- pathways and larger processes made up of the activities of multiple gene products
GO terms are organized in a directed acyclic graph, where edges between terms represent parent-child relationship.
Kyoto Encyclopedia of Genes and Genomes (KEGG)
KEGG is a collection of manually drawn pathway maps representing molecular interaction and reaction networks. These pathways cover a wide range of biochemical processes that can be divided into 7 broad categories:
- Metabolism
- Genetic information processing
- Environmental information processing
- Cellular processes
- Organismal systems
- Human diseases
- Drug development.
Other gene sets
GO and KEGG are the most frequently used for functional analysis. They are typically the first choice because of their long-standing curation and availability for a wide range of species.
Other gene sets include but are not limited to Disease Ontology (DO), Disease Gene Network (DisGeNET), wikiPathways, Molecular Signatures Database (MSigDb).
In the current clusterProfiler ecosystem, classical ORA and GSEA are now complemented by enrichit-backed topology-aware and multi-omics workflows. The algorithm engine lives in enrichit, while clusterProfiler provides high-level interfaces such as nseGO(), mnseGO(), nseKEGG(), and mnseKEGG() for biological interpretation.