Interactive Volcano Plot

Author

Guangchuang Yu
School of Basic Medical Sciences, Southern Medical University

Published

August 23, 2025

示例数据和差异分析

library(airway)
library(DESeq2)
library(org.Hs.eg.db)
library(AnnotationDbi)

data("airway")

# 提取 counts 矩阵并去掉总计数 <=1 的基因
counts_mat <- assay(airway)
airway <- airway[rowSums(counts_mat) > 1, ]

# 构建 DESeq2 数据集并分析差异表达
dds <- DESeqDataSet(airway, design = ~ cell + dex)
dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
res <- results(dds, contrast = c("dex", "trt", "untrt"))

# 转为 data.frame 并加 gene symbol
df <- as.data.frame(res)
df$gene_id <- rownames(df)
df$symbol <- mapIds(org.Hs.eg.db, 
                    keys = df$gene_id,
                    column = "SYMBOL", 
                    keytype = "ENSEMBL", 
                    multiVals = "first")
'select()' returned 1:many mapping between keys and columns
df <- df[!is.na(df$symbol), ]

head(df)
                    baseMean log2FoldChange      lfcSE       stat       pvalue
ENSG00000000003  708.6021697    -0.38125398 0.10065441 -3.7877523 1.520163e-04
ENSG00000000419  520.2979006     0.20681260 0.11221865  1.8429433 6.533729e-02
ENSG00000000457  237.1630368     0.03792043 0.14344465  0.2643558 7.915057e-01
ENSG00000000460   57.9326331    -0.08816818 0.28714182 -0.3070545 7.588019e-01
ENSG00000000938    0.3180984    -1.37822703 3.49987280 -0.3937935 6.937335e-01
ENSG00000000971 5817.3528677     0.42640213 0.08831339  4.8282840 1.377146e-06
                        padj         gene_id symbol
ENSG00000000003 1.281209e-03 ENSG00000000003 TSPAN6
ENSG00000000419 1.962081e-01 ENSG00000000419   DPM1
ENSG00000000457 9.111955e-01 ENSG00000000457  SCYL3
ENSG00000000460 8.946325e-01 ENSG00000000460  FIRRM
ENSG00000000938           NA ENSG00000000938    FGR
ENSG00000000971 1.818075e-05 ENSG00000000971    CFH

交互式可视化火山图

默认画的是可交互式的图,如果传入参数interactive = FALSE,则会出普通的ggplot静态图。

可交互图中,鼠标悬停会显示基因名、logFC和adjusted P values等信息。

onclick_fun传入一个函数,可以定义点击点(基因)后的行为,这里传入onclick_genecards,会跳转到GeneCards网站。

library(ivolcano)

ivolcano(df,
        logFC_col = "log2FoldChange",
        pval_col = "padj",
        gene_col = "symbol",
        top_n = 5,
        onclick_fun=onclick_genecards)

我们还可以通过fanyi来检索NCBI上的基因信息,拿来在图上展示。这里还将基因的’summary’信息翻译成中文。

df$entrez <- mapIds(org.Hs.eg.db, 
                    keys = df$gene_id,
                    column = "ENTREZID", 
                    keytype = "ENSEMBL", 
                    multiVals = "first")
'select()' returned 1:many mapping between keys and columns
top_eg <- df$entrez[order(df$padj)][1:50]

library(fanyi)
If you use fanyi in published research, please cite:
Guangchuang Yu. Using fanyi to assist research communities in retrieving and interpreting information. bioRxiv 2023, doi: 10.1101/2023.12.21.572729
gs = gene_summary(top_eg)
gs$summary_cn <- tencent_translate(gs$summary)
head(gs)
          uid    name
SPARCL1  8404 SPARCL1
CACNB2    783  CACNB2
DUSP1    1843   DUSP1
SAMHD1  25939  SAMHD1
MAOA     4128    MAOA
GPX3     2878    GPX3
                                                                            description
SPARCL1                                                                    SPARC like 1
CACNB2                           calcium voltage-gated channel auxiliary subunit beta 2
DUSP1                                                    dual specificity phosphatase 1
SAMHD1  SAM and HD domain containing deoxynucleoside triphosphate triphosphohydrolase 1
MAOA                                                                monoamine oxidase A
GPX3                                                           glutathione peroxidase 3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              summary
SPARCL1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Predicted to enable calcium ion binding activity; collagen binding activity; and extracellular matrix binding activity. Predicted to be involved in regulation of synapse organization. Located in extracellular space. [provided by Alliance of Genome Resources, Jul 2025]
CACNB2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        This gene encodes a subunit of a voltage-dependent calcium channel protein that is a member of the voltage-gated calcium channel superfamily. The gene product was originally identified as an antigen target in Lambert-Eaton myasthenic syndrome, an autoimmune disorder. Mutations in this gene are associated with Brugada syndrome. Alternatively spliced variants encoding different isoforms have been described. [provided by RefSeq, Feb 2013]
DUSP1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        The protein encoded by this gene is a phosphatase with dual specificity for tyrosine and threonine. The encoded protein can dephosphorylate MAP kinase MAPK1/ERK2, which results in its involvement in several cellular processes. This protein appears to play an important role in the human cellular response to environmental stress as well as in the negative regulation of cellular proliferation. Finally, the encoded protein can make some solid tumors resistant to both chemotherapy and radiotherapy, making it a target for cancer therapy. [provided by RefSeq, Aug 2017]
SAMHD1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                This gene may play a role in regulation of the innate immune response. The encoded protein is upregulated in response to viral infection and may be involved in mediation of tumor necrosis factor-alpha proinflammatory responses. Mutations in this gene have been associated with Aicardi-Goutieres syndrome. [provided by RefSeq, Mar 2010]
MAOA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       This gene is one of two neighboring gene family members that encode mitochondrial enzymes which catalyze the oxidative deamination of amines, such as dopamine, norepinephrine, and serotonin. Mutation of this gene results in Brunner syndrome. This gene has also been associated with a variety of other psychiatric disorders, including antisocial behavior. Alternatively spliced transcript variants encoding multiple isoforms have been observed. [provided by RefSeq, Jul 2012]
GPX3    The protein encoded by this gene belongs to the glutathione peroxidase family, members of which catalyze the reduction of organic hydroperoxides and hydrogen peroxide (H2O2) by glutathione, and thereby protect cells against oxidative damage. Several isozymes of this gene family exist in vertebrates, which vary in cellular location and substrate specificity. This isozyme is secreted, and is abundantly found in plasma. Downregulation of expression of this gene by promoter hypermethylation has been observed in a wide spectrum of human malignancies, including thyroid cancer, hepatocellular carcinoma and chronic myeloid leukemia. This isozyme is also a selenoprotein, containing the rare amino acid selenocysteine (Sec) at its active site. Sec is encoded by the UGA codon, which normally signals translation termination. The 3' UTRs of selenoprotein mRNAs contain a conserved stem-loop structure, designated the Sec insertion sequence (SECIS) element, that is necessary for the recognition of UGA as a Sec codon, rather than as a stop signal. Alternatively spliced transcript variants have been found for this gene. [provided by RefSeq, Jul 2016]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          summary_cn
SPARCL1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      预测具有钙离子结合活性;胶原蛋白结合活性;和细胞外矩阵结合活性。预计参与突触组织的调节。位于细胞外间隙。[由基因组资源联盟提供,2025年7月]
CACNB2                                                                                                                                                                                                                                                                                                                                                                                                           该基因编码电压依赖性钙通道蛋白的一个亚基,该蛋白是电压门控钙通道超家族的成员。该基因产物最初被确定为Lambert-Eaton肌无力综合征(一种自身免疫性疾病)的抗原目标。该基因的突变与Brugada综合症有关。已经描述了编码不同同工型的替代性拼接变体。[由RefSeq提供,2013年2月]
DUSP1                                                                                                                                                                                                                                                                                                                                              该基因编码的蛋白质是一种对酪氨和钍具有双重特异性的磷酸酶。所编码的蛋白质可以使MAP同工酶MAP 1/ERG 2去磷酸化,从而导致其参与多种细胞过程。该蛋白质似乎在人类细胞对环境应激的反应以及细胞增生的负调节中发挥着重要作用。最后,编码的蛋白质可以使一些实体肿瘤对化疗和放疗都有耐药性,使其成为癌症治疗的靶点。[由RefSeq提供,2017年8月]
SAMHD1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         该基因可能在调节先天免疫反应中发挥作用。编码的蛋白质因病毒感染而上调,并可能参与介导肿瘤坏死因子-阿尔法促炎症反应。该基因的突变与艾卡迪-古铁雷斯综合征有关。[由RefSeq提供,2010年3月]
MAOA                                                                                                                                                                                                                                                                                                                                                                                                               该基因是编码线粒体酶的两个邻近基因家族成员之一,线粒体酶催化多巴胺、去甲肾上腺素和血清素等胺的氧化脱氨作用。该基因的突变会导致布伦纳综合征。该基因还与多种其他精神疾病有关,包括反社会行为。已经观察到编码多种同工型的替代性拼接转录变体。[RefSeq提供,2012年7月]
GPX3    该基因编码的蛋白质属于谷氨肽过氧化物家族,其成员催化谷氨肽对有机氢过氧化氢和过氧化氢(H2 O2)的还原,从而保护细胞免受氧化损伤。该基因家族的几种同工酶存在于脊椎动物中,它们的细胞位置和代谢物特异性各不相同。这种同工酶是分泌的,并且在血浆中大量发现。在多种人类恶性肿瘤中,包括甲状腺癌、肝细胞癌和慢性骨髓性白血病,已观察到启动子超甲基化导致该基因的表达下调。这种同工酶也是一种硒蛋白,在其活性位点含有稀有氨基酸硒代半胱氨酸(Sec)。Sec由UGA密码子编码,其通常发出翻译终止的信号。硒蛋白mRNA的3'UTR含有保守的茎环结构,称为Sec插入序列(SECIS)元件,其是将UGA识别为Sec密码子而不是终止信号所必需的。已经发现了该基因的替代性拼接转录变体。[由RefSeq提供,2016年7月]

定义onclick函数,可以选中多个columns的信息进行展示,这里用了基因的描述(全称)和翻译后的中文summary信息。onclick_fanyi会按照我们的要求,返回一个函数定义,可以给我们传入ivolcano

onclick_fun <- onclick_fanyi(gs, c("description", "summary", "summary_cn"))

ivolcano(df,
        logFC_col="log2FoldChange",
        pval_col="padj",
        gene_col="symbol",
        top_n=5,
        onclick_fun = onclick_fun)