Predicting cell states and their variability in single-cell or spatial omics data
Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical Universityxshuangbin@163.com and guangchuangyu@gmail.com
2024-12-13
Introduction
Understanding the spatial distribution and interplay of cell states in tissue is critical for elucidating tissue formation and function. Single-cell and spatial omics present a promising approach to addressing this need. Traditional methods typically include the identification of highly variable genes, dimensionality reduction, clustering, and the annotation of cells or functions based on gene over-expression. Nevertheless, these qualitative approaches are inadequate for accurately mapping the distributions of spatial features. To address this, integrating biomedical knowledge such as Gene Ontology, KEGG, Reactome, transcription factors, and cell-type marker genes directly allows for the evaluation of cell states from gene expression data, creating quantitative functional pathway profiles at the single captured location.
After quantifying cell functions, analyzing their spatial distribution and co-distribution with other features can provide deeper insights into related biological questions. We focus on three aspects: the spatial variability of cell functions, regions where these functions cluster, and their co-distribution patterns with other features. Although existing tools such as SPARK-X
(Zhu, Sun, and Zhou 2021), nnSVG
(Weber et al. 2023), SpatialDE
(Svensson, Teichmann, and Stegle 2018), SpaGFT
(Chang et al. 2024), Seurat
(Hao et al. 2023), and Squidpy
(Palla et al. 2022) facilitate the exploration of spatially variable genes, they are primarily designed for gene-level analysis and lack the capability to investigate the spatial co-distribution of features. Additionally, many of these tools, including SpatialDE
(Svensson, Teichmann, and Stegle 2018), SPARK
(Sun, Zhu, and Zhou 2020), MERINGUE
(Miller et al. 2021), and nnSVG
(Weber et al. 2023), face challenges in handling large-scale spatial transcriptome data due to high memory consumption and low computational efficiency.
To fill the gaps, we developed SVP
to accurately predict cell states, explore their spatial distribution, and assess their spatial relationship with other features.