We developed DOSE(Yu et al. 2015) package to promote the investigation of diseases. DOSE provides five methods for measuring semantic similarities among DO terms and gene products, hypergeometric model and gene set enrichment analysis (GSEA) for associating disease with gene list and extracting disease association insight from genome wide expression profiles.
The enrichDO() function requires an entrezgene ID vector as input, which is mostly the differential gene list from gene expression profile studies. Please refer to Section 17.1 if you need to convert other gene ID types to entrezgene ID.
The ont parameter can be “HDO” (Human Disease Ontology), “HPO” (Human Phenotype Ontology) or “MPO” (Mouse Phenotype Ontology). pvalueCutoff setting the cutoff value of p value and adjusted p value; pAdjustMethod setting the p value correction methods, include the Bonferroni correction (“bonferroni”), Holm (“holm”), Hochberg (“hochberg”), Hommel (“hommel”), Benjamini & Hochberg (“BH”) and Benjamini & Yekutieli (“BY”) while qvalueCutoff is used to control q-values.
The universe sets the background gene universe for testing. If users do not explicitly set this parameter, enrichDO() will set the universe to all human genes that have DO annotation.
The minGSSize (and maxGSSize) indicates that only those DO terms that have more than minGSSize (and less than maxGSSize) annotated genes will be tested.
The readable is a logical parameter that indicates whether the entrezgene IDs will be mapped to gene symbols or not, see also Section 17.2.
8.1.2 Over-representation analysis for the network of cancer gene
Network of Cancer Gene (NCG) (A. et al. 2016) is a manually curated repository of cancer genes. NCG release 5.0 (Aug. 2015) collects 1,571 cancer genes from 175 published studies. DOSE supports analyzing gene list and determine whether they are enriched in genes known to be mutated in a given cancer type.
8.1.3 Over-representation analysis for the disease gene network
DisGeNET(Janet et al. 2015) is an integrative and comprehensive resource of gene-disease associations from several public data sources and the literature. It contains gene-disease associations and snp-gene-disease associations.
The enrichment analysis of disease-gene associations is supported by the enrichDGN function and analysis of snp-gene-disease associations is supported by the enrichDGNv function.
In the following example, in order to speed up the compilation of this document, only gene sets with size above 120 were tested and only 100 permutations were performed.
ID Description
pan-gynecological and breast pan-gynecological and breast NA
breast_fibroepithelial_tumours breast_fibroepithelial_tumours NA
pan-gastric pan-gastric NA
setSize enrichmentScore NES pvalue
pan-gynecological and breast 43 -0.5263430 -1.701061 0.002263971
breast_fibroepithelial_tumours 17 -0.6421576 -1.687765 0.009662272
pan-gastric 49 -0.4993804 -1.650134 0.007854058
p.adjust qvalue rank
pan-gynecological and breast 0.1713378 0.07782234 2464
breast_fibroepithelial_tumours 0.2125700 0.09655020 2700
pan-gastric 0.2125700 0.09655020 3280
leading_edge
pan-gynecological and breast tags=40%, list=20%, signal=32%
breast_fibroepithelial_tumours tags=53%, list=22%, signal=42%
pan-gastric tags=49%, list=26%, signal=36%
core_enrichment
pan-gynecological and breast NIPBL/SPOP/ARID1A/RASA1/RB1/RNF43/MAP2K4/NF1/CTNNB1/TP53/PIK3R1/CDKN1B/CCND1/ARID5B/MAP3K1/TBX3/GATA3
breast_fibroepithelial_tumours SETD2/RB1/PCNX4/NF1/TP53/RARA/SYNE1/MAP3K1/ERBB4
pan-gastric BCOR/SOX9/TCF7L2/ATM/CALD1/SEMG2/HTR7/ARID1A/RASA1/RB1/TTBK2/RNF43/CTNNB1/TP53/BCL9/SMAD3/APC/ZFP36L2/TGFBR2/MUC6/MAP3K1/CACNA1C/ATP8B1/CYP4B1
A., Omer, Giovanni M. D., Thanos P. M., and Francesca D. C. 2016. “NCG 5.0: Updates of a Manually Curated Repository of Cancer Genes and Associated Properties from Cancer Mutational Screenings.”Nucleic Acids Research 44 (D1): D992–99. https://doi.org/10.1093/nar/gkv1123.
Janet, P., Núria Q. R., Àlex B., Jordi D. P., Anna B. M., Martin B., Ferran S., and Laura I. F. 2015. “DisGeNET: A Discovery Platform for the Dynamical Exploration of Human Diseases and Their Genes.”Database 2015 (March): bav028. https://doi.org/10.1093/database/bav028.
Schriml, L. M., C. Arze, S. Nadendla, Y.-W. W. Chang, M. Mazaitis, V. Felix, G. Feng, and W. A. Kibbe. 2011. “Disease Ontology: A Backbone for Disease Semantic Integration.”Nucleic Acids Research 40 (D1): D940–46. https://doi.org/10.1093/nar/gkr972.
Yu, Guangchuang, Li-Gen Wang, Guang-Rong Yan, and Qing-Yu He. 2015. “DOSE: An r/Bioconductor Package for Disease Ontology Semantic and Enrichment Analysis.”Bioinformatics 31 (4): 608–9. https://doi.org/10.1093/bioinformatics/btu684.