#############################
## BioC 2.14 to BioC 3.13 ##
#############################
##
## library(MeSH.Hsa.eg.db)
## db <- MeSH.Hsa.eg.db
##
##---------------------------
# From BioC 3.14 (Nov. 2021, with R-4.2.0)
library(AnnotationHub)
library(MeSHDbi)
ah <- AnnotationHub(localHub=TRUE)
hsa <- query(ah, c("MeSHDb", "Homo sapiens"))
file_hsa <- hsa[[1]]
db <- MeSHDbi::MeSHDb(file_hsa)3 MeSH semantic similarity analysis
MeSH (Medical Subject Headings) is the NLM (U.S. National Library of Medicine) controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH is a comprehensive life science vocabulary. MeSH has 19 categories and MeSH.db contains 16 of them. That is:
| Abbreviation | Category |
|---|---|
| A | Anatomy |
| B | Organisms |
| C | Diseases |
| D | Chemicals and Drugs |
| E | Analytical, Diagnostic and Therapeutic Techniques and Equipment |
| F | Psychiatry and Psychology |
| G | Phenomena and Processes |
| H | Disciplines and Occupations |
| I | Anthropology, Education, Sociology and Social Phenomena |
| J | Technology and Food and Beverages |
| K | Humanities |
| L | Information Science |
| M | Persons |
| N | Health Care |
| V | Publication Type |
| Z | Geographical Locations |
MeSH terms were associated with Entrez Gene ID by three methods, gendoo, gene2pubmed and RBBH (Reciprocal Blast Best Hit).
| Method | Way of corresponding Entrez Gene IDs and MeSH IDs |
|---|---|
| Gendoo | Text-mining |
| gene2pubmed | Manual curation by NCBI teams |
| RBBH | sequence homology with BLASTP search (E-value<10-50) |
3.1 Supported organisms
The meshes package (Yu 2018) relies on MeSHDb to prepare semantic data for measuring simiarlity. MeSHDb can be downloaded from AnnotationHub (see also AHMeSHDbs) and about 200 species are available and are supported by the meshes package.
First, we need to load/fetch species-specific MeSH annotation database:
The semantic data can be prepared by the meshdata() function:
3.2 MeSH semantic similarity measurement
The meshes package (Yu 2018) implemented four IC-based methods (i.e. Resnik (Philip 1999), Jiang (Jiang and Conrath 1997), Lin (Lin 1998) and Schlicker (Schlicker et al. 2006)) and one graph-structure based method (i.e. Wang (Wang et al. 2007)), to measure MeSH term semantic similarity. For algorithm details, please refer to Chapter 1.
The meshSim() function is designed to measure semantic similarity between two MeSH term vectors.
[1] 0.3847944
meshSim("D000009", "D009130", semData=hsamd, measure="Rel")[1] 0.633538
meshSim("D000009", "D009130", semData=hsamd, measure="Jiang")[1] 0.5587351
meshSim("D000009", "D009130", semData=hsamd, measure="Wang")[1] 0.5557103
D017629 D002890 D008928
D001369 0.2886598 0.1923711 0.2193326
D002462 0.6521739 0.2381925 0.2809552
3.3 Gene semantic similarity measurement
The geneSim() function is designed to measure semantic similarity among two gene vectors.