We developed DOSE(Yu et al. 2015) package to promote the investigation of diseases. DOSE provides five methods for measuring semantic similarities among DO terms and gene products, hypergeometric model and gene set enrichment analysis (GSEA) for associating disease with gene list and extracting disease association insight from genome wide expression profiles.
The enrichDO() function requires an entrezgene ID vector as input, which is mostly the differential gene list from gene expression profile studies. Please refer to Section 17.1 if you need to convert other gene ID types to entrezgene ID.
The ont parameter can be “HDO” (Human Disease Ontology), “HPO” (Human Phenotype Ontology) or “MPO” (Mouse Phenotype Ontology). pvalueCutoff setting the cutoff value of p value and adjusted p value; pAdjustMethod setting the p value correction methods, include the Bonferroni correction (“bonferroni”), Holm (“holm”), Hochberg (“hochberg”), Hommel (“hommel”), Benjamini & Hochberg (“BH”) and Benjamini & Yekutieli (“BY”) while qvalueCutoff is used to control q-values.
The universe sets the background gene universe for testing. If users do not explicitly set this parameter, enrichDO() will set the universe to all human genes that have DO annotation.
The minGSSize (and maxGSSize) indicates that only those DO terms that have more than minGSSize (and less than maxGSSize) annotated genes will be tested.
The readable is a logical parameter that indicates whether the entrezgene IDs will be mapped to gene symbols or not, see also Section 17.2.
8.1.2 Over-representation analysis for the network of cancer gene
Network of Cancer Gene (NCG) (A. et al. 2016) is a manually curated repository of cancer genes. NCG release 5.0 (Aug. 2015) collects 1,571 cancer genes from 175 published studies. DOSE supports analyzing gene list and determine whether they are enriched in genes known to be mutated in a given cancer type.
8.1.3 Over-representation analysis for the disease gene network
DisGeNET(Janet et al. 2015) is an integrative and comprehensive resource of gene-disease associations from several public data sources and the literature. It contains gene-disease associations and snp-gene-disease associations.
The enrichment analysis of disease-gene associations is supported by the enrichDGN function and analysis of snp-gene-disease associations is supported by the enrichDGNv function.
dgn <-enrichDGN(gene) head(dgn)
ID Description GeneRatio BgRatio
C0010278 C0010278 Craniosynostosis 43/497 488/21671
C0853879 C0853879 Invasive carcinoma of breast 42/497 473/21671
C4733092 C4733092 estrogen receptor-negative breast cancer 34/497 356/21671
C3642347 C3642347 Basal-Like Breast Carcinoma 28/497 245/21671
C3642345 C3642345 Luminal A Breast Carcinoma 22/497 153/21671
C0036202 C0036202 Sarcoidosis 36/497 413/21671
RichFactor FoldEnrichment zScore pvalue p.adjust qvalue
C0010278 0.08811475 3.842122 9.728932 4.609534e-14 3.150086e-10 NA
C0853879 0.08879493 3.871780 9.674768 7.105190e-14 3.150086e-10 NA
C4733092 0.09550562 4.164391 9.223137 2.446675e-12 6.756634e-09 NA
C3642347 0.11428571 4.983271 9.606353 3.047991e-12 6.756634e-09 NA
C3642345 0.14379085 6.269802 10.021789 7.034749e-12 1.172052e-08 NA
C0036202 0.08716707 3.800800 8.804448 7.930882e-12 1.172052e-08 NA
geneID
C0010278 6890/7031/3572/2625/1493/2200/6279/29968/3479/2152/1308/81620/3576/4312/4069/4330/8318/1300/563/27299/3002/2146/6352/1062/3117/5327/4982/1846/6278/9607/185/3667/990/3627/2697/6280/26279/6424/4318/6362/820/23594/4998
C0853879 2625/4582/983/51203/1894/23158/3169/5241/2151/1111/8792/890/3479/2952/4171/7980/3576/4312/10855/5304/9787/1978/2146/2173/2066/4982/6278/53335/185/367/5080/701/8836/7153/8842/6664/9232/1602/4318/9582/4102/6926
C4733092 2625/4582/1894/29968/3169/5241/8792/3479/4137/81620/3576/2491/2305/5304/10551/80129/8061/7083/2146/5327/4982/1846/6278/3667/367/214/11004/27324/79733/1408/6241/3620/8839/81930
C3642347 2625/6663/11065/29127/3169/5241/18/8792/3479/10232/3576/2305/5163/2146/2173/2568/1062/1846/9833/55765/27324/4605/6790/6664/4318/3159/3620/7368
C3642345 3161/2625/6663/4582/3169/5241/1111/8614/2152/3576/4288/2305/5304/80129/2001/4128/185/9833/7153/27324/4318/55355
C0036202 6364/6890/4582/26227/4283/6863/6279/5004/25802/3479/23541/4321/2952/3576/4312/4069/2053/27299/3002/6352/4485/3117/6355/4982/1524/185/3627/6280/2167/29851/4318/80736/10403/6373/6362/7043
Count
C0010278 43
C0853879 42
C4733092 34
C3642347 28
C3642345 22
C0036202 36
In the following example, in order to speed up the compilation of this document, only gene sets with size above 120 were tested and only 100 permutations were performed.
ID Description
pan-gynecological and breast pan-gynecological and breast NA
breast_fibroepithelial_tumours breast_fibroepithelial_tumours NA
pan-gastric pan-gastric NA
setSize enrichmentScore NES pvalue
pan-gynecological and breast 43 -0.5263430 -1.714876 0.008035671
breast_fibroepithelial_tumours 17 -0.6421576 -1.710019 0.007874177
pan-gastric 49 -0.4993804 -1.653436 0.006065501
p.adjust qvalue rank
pan-gynecological and breast 0.1767848 0.08979085 2464
breast_fibroepithelial_tumours 0.1767848 0.08979085 2700
pan-gastric 0.1767848 0.08979085 3280
leading_edge
pan-gynecological and breast tags=40%, list=20%, signal=32%
breast_fibroepithelial_tumours tags=53%, list=22%, signal=42%
pan-gastric tags=49%, list=26%, signal=36%
core_enrichment
pan-gynecological and breast NIPBL/SPOP/ARID1A/RASA1/RB1/RNF43/MAP2K4/NF1/CTNNB1/TP53/PIK3R1/CDKN1B/CCND1/ARID5B/MAP3K1/TBX3/GATA3
breast_fibroepithelial_tumours SETD2/RB1/PCNX4/NF1/TP53/RARA/SYNE1/MAP3K1/ERBB4
pan-gastric BCOR/SOX9/TCF7L2/ATM/CALD1/SEMG2/HTR7/ARID1A/RASA1/RB1/TTBK2/RNF43/CTNNB1/TP53/BCL9/SMAD3/APC/ZFP36L2/TGFBR2/MUC6/MAP3K1/CACNA1C/ATP8B1/CYP4B1
A., Omer, Giovanni M. D., Thanos P. M., and Francesca D. C. 2016. “NCG 5.0: Updates of a Manually Curated Repository of Cancer Genes and Associated Properties from Cancer Mutational Screenings.”Nucleic Acids Research 44 (D1): D992–99. https://doi.org/10.1093/nar/gkv1123.
Janet, P., Núria Q. R., Àlex B., Jordi D. P., Anna B. M., Martin B., Ferran S., and Laura I. F. 2015. “DisGeNET: A Discovery Platform for the Dynamical Exploration of Human Diseases and Their Genes.”Database 2015 (March): bav028. https://doi.org/10.1093/database/bav028.
Schriml, L. M., C. Arze, S. Nadendla, Y.-W. W. Chang, M. Mazaitis, V. Felix, G. Feng, and W. A. Kibbe. 2011. “Disease Ontology: A Backbone for Disease Semantic Integration.”Nucleic Acids Research 40 (D1): D940–46. https://doi.org/10.1093/nar/gkr972.
Yu, Guangchuang, Li-Gen Wang, Guang-Rong Yan, and Qing-Yu He. 2015. “DOSE: An r/Bioconductor Package for Disease Ontology Semantic and Enrichment Analysis.”Bioinformatics 31 (4): 608–9. https://doi.org/10.1093/bioinformatics/btu684.