10 Disease enrichment analysis
We developed DOSE (Yu et al. 2015) package to promote the investigation of diseases. DOSE provides five methods for measuring semantic similarities among DO terms and gene products, hypergeometric model and gene set enrichment analysis (GSEA) for associating disease with gene list and extracting disease association insight from genome wide expression profiles.
10.1 Disease over-representation analysis
DOSE supports enrichment analysis of Disease Ontology (DO) (Schriml et al. 2011), Network of Cancer Gene (A. et al. 2016) and Disease Gene Network (DisGeNET) (Janet et al. 2015). In addition, several visualization methods were provided by enrichplot to help interpreting semantic and enrichment results.
10.1.1 Over-representation analysis for disease ontology
In the following example, we selected fold change above 1.5 as the differential genes and analyzing their disease association.
## [1] "4312" "8318" "10874" "55143" "55388" "991"
x <- enrichDO(gene = gene,
ont = "DO",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe = names(geneList),
minGSSize = 5,
maxGSSize = 500,
qvalueCutoff = 0.05,
readable = FALSE)
head(x)
## ID Description GeneRatio BgRatio
## DOID:170 DOID:170 endocrine gland cancer 48/331 472/6268
## DOID:10283 DOID:10283 prostate cancer 40/331 394/6268
## DOID:3459 DOID:3459 breast carcinoma 37/331 357/6268
## DOID:3856 DOID:3856 male reproductive organ cancer 40/331 404/6268
## DOID:824 DOID:824 periodontitis 16/331 109/6268
## DOID:3905 DOID:3905 lung carcinoma 43/331 465/6268
## pvalue p.adjust qvalue
## DOID:170 5.662129e-06 0.004784499 0.003826407
## DOID:10283 3.859157e-05 0.013921739 0.011133923
## DOID:3459 4.942629e-05 0.013921739 0.011133923
## DOID:3856 6.821467e-05 0.014410349 0.011524689
## DOID:824 1.699304e-04 0.018859464 0.015082872
## DOID:3905 1.749754e-04 0.018859464 0.015082872
## geneID
## DOID:170 10874/7153/1381/6241/11065/10232/332/6286/2146/10112/891/9232/4171/993/5347/4318/3576/1515/4821/8836/3159/7980/5888/333/898/9768/4288/3551/2152/9590/185/7043/3357/2952/5327/3667/1634/1287/4582/7122/3479/4680/6424/80310/652/8839/9547/1524
## DOID:10283 4312/6280/6279/597/3627/332/6286/2146/4321/4521/891/5347/4102/4318/701/3576/79852/10321/6352/4288/3551/2152/247/2952/3487/367/3667/4128/4582/563/3679/4117/7031/3479/6424/10451/80310/652/4036/10551
## DOID:3459 4312/6280/6279/7153/4751/890/4085/332/6286/6790/891/9232/10855/4171/5347/4318/701/2633/3576/9636/898/8792/4288/2952/4982/4128/4582/7031/3479/771/4250/2066/3169/10647/5304/5241/10551
## DOID:3856 4312/6280/6279/597/3627/332/6286/2146/4321/4521/891/5347/4102/4318/701/3576/79852/10321/6352/4288/3551/2152/247/2952/3487/367/3667/4128/4582/563/3679/4117/7031/3479/6424/10451/80310/652/4036/10551
## DOID:824 4312/6279/820/7850/4321/3595/4318/4069/3576/1493/6352/8842/185/2952/5327/4982
## DOID:3905 4312/6280/2305/9133/6279/7153/6278/6241/55165/11065/8140/10232/332/6286/3002/9212/4521/891/4171/9928/8061/4318/3576/1978/1894/7980/7083/898/6352/8842/4288/2152/2697/2952/3572/4582/7049/563/3479/1846/3117/2532/2922
## Count
## DOID:170 48
## DOID:10283 40
## DOID:3459 37
## DOID:3856 40
## DOID:824 16
## DOID:3905 43
The enrichDO()
function requires an entrezgene ID vector as input, mostly is the differential gene list of gene expression profile studies. Please refer to session 16.1 if you need to conver other gene ID types to entrezgene ID.
The ont
parameter can be “DO” or “DOLite”, DOLite (Du et al. 2009) was constructed to aggregate the redundant DO terms. The DOLite data is not updated, we recommend user use ont="DO"
. pvalueCutoff
setting the cutoff value of p value and adjusted p value; pAdjustMethod
setting the p value correction methods, include the Bonferroni correction (“bonferroni”), Holm (“holm”), Hochberg (“hochberg”), Hommel (“hommel”), Benjamini & Hochberg (“BH”) and Benjamini & Yekutieli (“BY”) while qvalueCutoff
is used to control q-values.
The universe
setting the background gene universe for testing. If user do not explicitly setting this parameter, enrichDO()
will set the universe to all human genes that have DO annotation.
The minGSSize
(and maxGSSize
) indicates that only those DO terms that have more than minGSSize
(and less than maxGSSize
) genes annotated will be tested.
The readable
is a logical parameter, indicates whether the entrezgene IDs will mapping to gene symbols or not (see also setReadable).
10.1.2 Over-representation analysis for the network of cancer gene
Network of Cancer Gene (NCG) (A. et al. 2016) is a manually curated repository of cancer genes. NCG release 5.0 (Aug. 2015) collects 1,571 cancer genes from 175 published studies. DOSE supports analyzing gene list and determine whether they are enriched in genes known to be mutated in a given cancer type.
## ID
## pan-cancer_paediatric pan-cancer_paediatric
## triple_negative_breast_cancer triple_negative_breast_cancer
## bladder_cancer bladder_cancer
## pancreatic_cancer_(all_histologies) pancreatic_cancer_(all_histologies)
## soft_tissue_sarcoma soft_tissue_sarcoma
## paediatric_high-grade_glioma paediatric_high-grade_glioma
## Description
## pan-cancer_paediatric pan-cancer_paediatric
## triple_negative_breast_cancer triple_negative_breast_cancer
## bladder_cancer bladder_cancer
## pancreatic_cancer_(all_histologies) pancreatic_cancer_(all_histologies)
## soft_tissue_sarcoma soft_tissue_sarcoma
## paediatric_high-grade_glioma paediatric_high-grade_glioma
## GeneRatio BgRatio pvalue
## pan-cancer_paediatric 162/2281 183/3177 1.833773e-08
## triple_negative_breast_cancer 71/2281 75/3177 4.290660e-07
## bladder_cancer 97/2281 112/3177 1.253690e-04
## pancreatic_cancer_(all_histologies) 40/2281 42/3177 1.262162e-04
## soft_tissue_sarcoma 26/2281 26/3177 1.742793e-04
## paediatric_high-grade_glioma 25/2281 25/3177 2.434966e-04
## p.adjust qvalue
## pan-cancer_paediatric 1.613721e-06 7.721152e-07
## triple_negative_breast_cancer 1.887890e-05 9.032967e-06
## bladder_cancer 2.776757e-03 1.328592e-03
## pancreatic_cancer_(all_histologies) 2.776757e-03 1.328592e-03
## soft_tissue_sarcoma 3.067315e-03 1.467615e-03
## paediatric_high-grade_glioma 3.073768e-03 1.470702e-03
## geneID
## pan-cancer_paediatric 2146/55353/4609/1029/3575/22806/3418/3066/2120/30012/867/7468/7545/3195/865/64109/4613/613/11177/7490/238/10736/10054/5771/4893/140885/1785/9760/3417/6597/6476/9126/4869/10320/7307/80204/1050/10992/8028/2312/6608/896/894/2196/4849/7023/5093/5079/5293/5727/55181/171017/51322/5781/3718/55294/60/673/8085/5897/4851/1665/51176/1108/7764/10664/6098/2332/2201/6495/3845/7015/1441/2782/64919/4298/23512/8239/29102/6929/8021/6134/6598/4209/5290/22941/8726/207/3717/2033/10716/4928/6932/694/5156/10019/6886/9968/7080/2623/7874/1654/4149/3020/23219/55252/55729/10735/5728/4853/23451/51341/387/3206/6146/79718/2624/63035/3815/171023/23269/25/23592/5896/7403/2260/54880/3716/9203/57178/6777/5789/4297/29072/90/546/120/25836/8289/4345/9611/5925/4763/1997/1499/7157/3399/5295/1387/4602/51564/1027/4005/2322/2078/678/6403/55709/1277/7494/64061/2625
## triple_negative_breast_cancer 6790/898/4609/1029/1789/4436/2120/867/7128/1788/1030/7490/2271/238/675/2047/4914/1316/5291/5293/5781/55294/8085/4851/4170/3845/355/1616/4854/5290/207/2033/4233/29110/2903/5979/5728/4853/2624/3815/10000/7403/2260/55193/472/5789/4297/2065/4286/8626/8405/8289/10499/55164/5925/4763/23405/1499/4921/7157/5295/1387/2078/324/7248/7048/22894/3480/2045/2066/2625
## bladder_cancer 9700/57211/2175/9603/1029/11168/2072/8997/79949/54663/688/6882/4893/8454/6693/56288/2195/10992/1026/64783/896/677/26038/6256/55294/60/8085/4851/841/3265/7175/1999/730051/3845/23484/7015/8243/10605/8295/4854/5290/51043/2033/4780/23224/23217/2064/23385/55252/10735/8241/10672/5728/4853/23451/374291/387/7799/171023/288/30849/4152/9794/7403/287/57634/463/472/4297/2065/2262/3280/23232/8289/9611/5925/2068/56339/4763/7157/2186/1387/3910/7536/2261/7248/23037/6709/54961/23345/57125/7832/79633/10628/22906/388/3169
## pancreatic_cancer_(all_histologies) 1029/4771/8997/7159/2011/6597/7307/10992/3710/6710/55294/7091/3845/23654/7046/3096/4089/91/8241/54549/92/23451/63035/7403/55193/23309/472/800/29072/23077/23499/8289/54894/6416/7157/4088/182/7048/2199/26960
## soft_tissue_sarcoma 999/6850/4914/4342/2185/55294/2041/4851/2044/4058/5290/4486/5297/5728/3815/2324/7403/546/5925/4763/1499/7157/5159/2045/3667/2066
## paediatric_high-grade_glioma 4609/1029/1019/4613/1030/1956/4914/896/894/673/8493/5290/4233/5156/1021/63035/54880/4916/90/546/4763/7157/5295/595/4915
## Count
## pan-cancer_paediatric 162
## triple_negative_breast_cancer 71
## bladder_cancer 97
## pancreatic_cancer_(all_histologies) 40
## soft_tissue_sarcoma 26
## paediatric_high-grade_glioma 25
10.1.3 Over-representation analysis for the disease gene network
DisGeNET(Janet et al. 2015) is an integrative and comprehensive resources of gene-disease associations from several public data sources and the literature. It contains gene-disease associations and snp-gene-disease associations.
The enrichment analysis of disease-gene associations is supported by the enrichDGN
function and analysis of snp-gene-disease associations is supported by the enrichDGNv
function.
## ID Description GeneRatio BgRatio
## C0010278 C0010278 Craniosynostosis 43/497 488/21671
## C0853879 C0853879 Invasive carcinoma of breast 42/497 473/21671
## C4733092 C4733092 estrogen receptor-negative breast cancer 34/497 356/21671
## C3642347 C3642347 Basal-Like Breast Carcinoma 28/497 245/21671
## C3642345 C3642345 Luminal A Breast Carcinoma 22/497 153/21671
## C0036202 C0036202 Sarcoidosis 36/497 413/21671
## pvalue p.adjust qvalue
## C0010278 4.609534e-14 2.267976e-10 1.636811e-10
## C0853879 7.105190e-14 2.267976e-10 1.636811e-10
## C4733092 2.446675e-12 4.864593e-09 3.510804e-09
## C3642347 3.047991e-12 4.864593e-09 3.510804e-09
## C3642345 7.034749e-12 8.438458e-09 6.090082e-09
## C0036202 7.930882e-12 8.438458e-09 6.090082e-09
## geneID
## C0010278 4312/8318/6280/1062/6279/6278/3627/820/27299/6362/81620/2146/3002/29968/990/4318/4069/3576/6890/23594/26279/1493/6352/4998/2152/2697/185/4330/5327/4982/1300/3667/2200/9607/3572/563/7031/3479/6424/1846/3117/1308/2625
## C0853879 4312/7153/6278/9787/9582/51203/890/983/5080/2146/1111/9232/10855/4171/6664/4102/2173/4318/701/3576/1978/8836/53335/1894/7980/8792/8842/2151/185/2952/367/4982/4582/6926/3479/1602/23158/2066/3169/5304/2625/5241
## C4733092 2305/6278/79733/6241/81930/81620/2146/3620/29968/11004/8061/3576/1894/2491/7083/8792/214/5327/367/4982/3667/4582/27324/3479/1846/80129/4137/8839/3169/1408/5304/2625/5241/10551
## C3642347 2305/1062/4605/9833/7368/11065/10232/55765/5163/2146/2568/3620/6790/6664/29127/2173/4318/3576/3159/8792/6663/27324/3479/1846/18/3169/2625/5241
## C3642345 2305/9833/7153/55355/1111/3161/4318/3576/2001/6663/4288/2152/185/4128/4582/27324/80129/3169/5304/8614/2625/5241
## C0036202 4312/6280/6279/10403/3627/6373/4283/27299/6362/3002/4321/6355/6364/29851/4318/5004/4069/3576/26227/6890/6352/4485/23541/185/7043/6863/2952/4982/25802/4582/2053/3479/3117/2167/80736/1524
## Count
## C0010278 43
## C0853879 42
## C4733092 34
## C3642347 28
## C3642345 22
## C0036202 36
snp <- c("rs1401296", "rs9315050", "rs5498", "rs1524668", "rs147377392",
"rs841", "rs909253", "rs7193343", "rs3918232", "rs3760396",
"rs2231137", "rs10947803", "rs17222919", "rs386602276", "rs11053646",
"rs1805192", "rs139564723", "rs2230806", "rs20417", "rs966221")
dgnv <- enrichDGNv(snp)
head(dgnv)
## ID Description GeneRatio BgRatio pvalue
## C0010054 C0010054 Coronary Arteriosclerosis 6/17 440/194515 1.568917e-12
## C0151744 C0151744 Myocardial Ischemia 4/17 103/194515 1.754840e-10
## C0031099 C0031099 Periodontitis 4/17 116/194515 2.839985e-10
## C0007785 C0007785 Cerebral Infarction 4/17 123/194515 3.599531e-10
## C0003850 C0003850 Arteriosclerosis 4/17 267/194515 8.145389e-09
## C0004153 C0004153 Atherosclerosis 4/17 281/194515 9.996713e-09
## p.adjust qvalue
## C0010054 2.761295e-10 NA
## C0151744 1.544259e-08 NA
## C0031099 1.583793e-08 NA
## C0007785 1.583793e-08 NA
## C0003850 2.867177e-07 NA
## C0004153 2.932369e-07 NA
## geneID Count
## C0010054 rs5498/rs147377392/rs11053646/rs1805192/rs2230806/rs20417 6
## C0151744 rs5498/rs147377392/rs11053646/rs1805192 4
## C0031099 rs5498/rs909253/rs1805192/rs20417 4
## C0007785 rs147377392/rs11053646/rs1805192/rs2230806 4
## C0003850 rs5498/rs1805192/rs2230806/rs20417 4
## C0004153 rs5498/rs1805192/rs2230806/rs20417 4
10.2 Disease gene set enrichment analysis
10.2.1 gseDO
fuction
In the following example, in order to speedup the compilation of this document, only gene sets with size above 120 were tested and only 100 permutations were performed.
library(DOSE)
data(geneList)
y <- gseDO(geneList,
minGSSize = 120,
pvalueCutoff = 0.2,
pAdjustMethod = "BH",
verbose = FALSE)
head(y, 3)
## ID Description setSize
## DOID:0050338 DOID:0050338 primary bacterial infectious disease 214
## DOID:399 DOID:399 tuberculosis 140
## DOID:104 DOID:104 bacterial infectious disease 243
## enrichmentScore NES pvalue p.adjust qvalues
## DOID:0050338 0.4569856 2.082577 9.784704e-10 1.536199e-07 5.973820e-08
## DOID:399 0.4999969 2.154362 4.812537e-09 3.777842e-07 1.469090e-07
## DOID:104 0.4286201 1.969547 1.028755e-08 5.383820e-07 2.093607e-07
## rank leading_edge
## DOID:0050338 1850 tags=35%, list=15%, signal=30%
## DOID:399 1808 tags=36%, list=14%, signal=31%
## DOID:104 1808 tags=32%, list=14%, signal=28%
## core_enrichment
## DOID:0050338 4312/597/3627/6373/820/3620/6364/29851/4318/3576/26227/6890/952/1493/6352/3934/54210/3932/5551/3559/6347/6402/639/94025/3126/3001/6351/1236/5698/3948/919/3458/959/7296/79139/3804/4159/942/3329/9235/1234/7096/3383/4068/6367/5806/100/3659/4360/939/6891/4210/671/7422/929/26191/6504/27087/4282/7124/5027/5329/3569/4049/7097/56244/7852/1378/5133/5743/348/1118/3119/7415
## DOID:399 4312/597/3627/820/6364/29851/4318/3576/26227/1493/6352/3934/5551/3559/6347/94025/3001/6351/3948/919/3458/959/4159/942/9235/7096/3383/4068/6367/5806/100/3659/4360/939/671/929/26191/27087/4282/7124/5027/3569/7097/56244/7852/1378/5133/348/1118/3119
## DOID:104 4312/597/3627/6373/820/3620/6364/29851/4318/3576/26227/6890/952/1493/6352/3934/54210/10663/3932/5551/3559/6772/6347/6402/639/94025/3126/3001/6351/1236/3654/5698/3948/919/3458/959/7296/79139/3804/4159/942/3329/9235/3689/1234/7096/3383/4068/6367/5806/100/3659/4360/939/6891/4210/671/7422/929/26191/6504/27087/4282/7124/5027/5329/3569/4049/7097/56244/7852/1378/5133/5743/348/1118/3119
10.2.2 gseNCG
fuction
ncg <- gseNCG(geneList,
pvalueCutoff = 0.5,
pAdjustMethod = "BH",
verbose = FALSE)
ncg <- setReadable(ncg, 'org.Hs.eg.db')
head(ncg, 3)
## ID
## pan-gynecological and breast pan-gynecological and breast
## pan-gastric pan-gastric
## breast_fibroepithelial_tumours breast_fibroepithelial_tumours
## Description setSize
## pan-gynecological and breast pan-gynecological and breast 43
## pan-gastric pan-gastric 49
## breast_fibroepithelial_tumours breast_fibroepithelial_tumours 17
## enrichmentScore NES pvalue p.adjust
## pan-gynecological and breast -0.5263429 -1.709808 0.001878160 0.07927729
## pan-gastric -0.4993803 -1.679020 0.001957464 0.07927729
## breast_fibroepithelial_tumours -0.6421576 -1.656489 0.004208555 0.08522323
## qvalues rank leading_edge
## pan-gynecological and breast 0.07417758 2464 tags=44%, list=20%, signal=36%
## pan-gastric 0.07417758 3280 tags=49%, list=26%, signal=36%
## breast_fibroepithelial_tumours 0.07974104 2700 tags=59%, list=22%, signal=46%
## core_enrichment
## pan-gynecological and breast ATM/ZC3H13/NIPBL/SPOP/ARID1A/RASA1/RB1/RNF43/MAP2K4/NF1/CTNNB1/TP53/PIK3R1/CDKN1B/CCND1/ARID5B/MAP3K1/TBX3/GATA3
## pan-gastric BCOR/SOX9/TCF7L2/ATM/CALD1/SEMG2/HTR7/ARID1A/RASA1/RB1/TTBK2/RNF43/CTNNB1/TP53/BCL9/SMAD3/APC/ZFP36L2/TGFBR2/MUC6/MAP3K1/CACNA1C/ATP8B1/CYP4B1
## breast_fibroepithelial_tumours BCOR/SETD2/RB1/PCNX4/NF1/TP53/RARA/SYNE1/MAP3K1/ERBB4
10.2.3 gseDGN
fuction
dgn <- gseDGN(geneList,
pvalueCutoff = 0.2,
pAdjustMethod = "BH",
verbose = FALSE)
dgn <- setReadable(dgn, 'org.Hs.eg.db')
head(dgn, 3)
## ID Description setSize enrichmentScore
## C0024266 C0024266 Lymphocytic Choriomeningitis 120 0.5712593
## C4721414 C4721414 Mantle cell lymphoma 368 0.4107437
## C0205682 C0205682 Waist-Hip Ratio 401 -0.4425633
## NES pvalue p.adjust qvalues rank
## C0024266 2.395470 1e-10 2.05275e-07 1.762105e-07 2579
## C4721414 1.950101 1e-10 2.05275e-07 1.762105e-07 1745
## C0205682 -1.943479 1e-10 2.05275e-07 1.762105e-07 2011
## leading_edge
## C0024266 tags=48%, list=21%, signal=38%
## C4721414 tags=26%, list=14%, signal=23%
## C0205682 tags=28%, list=16%, signal=24%
## core_enrichment
## C0024266 S100A9/CXCL10/CXCL9/EZH2/GZMB/ICOS/USP18/CXCL8/CTLA4/TREM1/PRF1/ADM/CA9/STAT1/CCL2/SELL/CDKN2A/IL7/IL7R/IFNG/CCR5/IL27RA/SH2D1A/FCER1G/CDK2AP2/CPVL/CD27/PSMB10/PTPN22/SLAMF1/KDM1A/TNF/IL6/FGL2/TLR2/RPAIN/NELFCD/PDCD1/WAS/HIF1A/ATP5F1B/FCGR2B/EGR2/STX11/CXCR3/TYROBP/YME1L1/SOSTDC1/PTPN2/TRAF1/HNF1A/IRF9/PML/NR0B2/IL2/TOX/AGFG1
## C4721414 CDC20/MELK/E2F8/APOBEC3B/PBK/TPX2/RAD51AP1/DUSP2/CDT1/EZH2/AURKB/CHEK1/AURKA/CCNB1/PSAT1/SOX11/PRAME/CDC6/PLK1/MMP9/EIF4EBP1/SPIB/RAD51/CD38/MMP7/MCM6/CTSC/LCK/MNX1/SKP2/STAT1/PRDM1/MS4A4A/IGK/MYC/PCNA/IFI27/PSMG1/CCR7/GMNN/E2F1/CDKN2A/PSMB9/NME1/LTB/IGHD/CD40LG/LAIR1/IGF2BP3/LBR/COL11A2/MSH2/CD79B/APRT/NSD2/CDK4/PTPRC/PLSCR1/CCR5/G6PD/CHEK2/HILPDA/DCK/PIM2/WNT3/CD6/CD28/MTAP/PRDX1/MRC1/TUBB3/VEGFA/CD19/HACD1/SOX4/PMCH/ST14/PARP1/TCL1A/DNMT1/IGLL5/SYK/TNF/MYCN/CD1D/NXT1/CDKN2B/RANGAP1/IL6/LTA/PSMD2/CXCR4/BCR/FCER2/FADD/PTGS2
## C0205682 RETREG3/JMJD1C/SH2B1/BDNF/ARHGEF26/PDE5A/BPTF/SMAD3/TTC39A/ATP2B1/ARID4A/HOXC4/NID1/LAMA4/LRRC36/NUDT18/ANKRD28/HECTD4/COL11A1/MEIS1/INSR/CUL9/NRIP3/BCL2/CD34/EZH1/DYM/NDST1/COL15A1/VGLL3/CCND1/ZCCHC10/HOXC6/RAB26/QTRT1/MEIS2/ARID5B/AHNAK/FGF1/CAPRIN2/LAMB1/CPEB3/ELOVL4/CDADC1/PDE8B/ZNF268/NRP1/SYTL2/NR5A2/DGLUCY/SEMA3B/NID2/SIK3/PRR5L/FGF2/COL8A1/RAPGEF3/RBM6/CDH13/JCAD/NAV3/TRIM8/PPIEL/PTPRG/NBL1/CALCRL/PPL/LPL/BCKDHB/MAPKBP1/CNTLN/BBS4/P4HTM/FTO/PDZRN4/PDGFC/SGCD/NRXN3/AFF3/IGF1R/ABCC8/MPPED2/COL5A1/COL6A2/LOXL1/CYP21A2/LTBP2/TTC28/PATJ/PCSK5/WNT4/TTC12/NISCH/ASTN2/TCEA2/MN1/SETBP1/TAOK1/MAST4/ITGA7/ITGBL1/COL14A1/C1QTNF3/ZNF423/IQCH/MYH11/ADH1B/ABLIM3/MAPT/STC2/TFAP2B/CYBRD1/SCUBE2