Chapter 1 Introduction

1.1 Terminology

1.1.1 Gene sets and pathway

A gene set is an unordered collection of genes that are functional related. A pathway can be interpreted as a gene set by ignoring functional relationships among genes.

1.1.2 Gene Ontology (GO)

Gene Ontology defines concepts/classes used to describe gene function, and relationships between these concepts. It classifies functions along three aspects:

  • MF: Molecular Function
    • molecular activities of gene products
  • CC: Cellular Component
    • where gene products are active
  • BP: Biological Process
    • pathways and larger processes made up of the activities of multiple gene products

GO terms are organized in a directed acyclic graph, where edge between the terms represent parent-child relationship.

1.1.3 Kyoto Encyclopedia of Genes and Genomes (KEGG)

KEGG is a collection of manually drawn pathway maps representing molecular interaction and reaction networks. These pathways cover a wide range of biochemical processes that can be divided in 7 broad categories: metabolism, genetic and environmental information processing, cellular processes, organismal systems, human diseases, and drug development1.

1.1.4 Other gene sets

GO and KEGG are most frequently used for the functional analysis. They are typically the first choice because their long-standing curation and availability for a wide range of species.

Other gene sets including but not limited to Disease Ontology (DO), Disease Gene Network (DisGeNET), wikiPathways, Molecular Signatures Database (MSigDb).