3 Spatial statistical analysis
SVP implements several spatial statistical algorithms to explore the spatial distribution of cell function or other features from three aspects: whether cell function states have spatial variability, where spatially variable cell functions cluster, and the co-distribution patterns of spatially variable cell functions with other features. The following sections provide a detailed description of each aspect.
3.1 Univariate spatial statistical analysis
This section addresses whether the feature is spatially variable and identifies its high-cluster areas. In detail, to identify the spatial variable features, SVP
implements three spatial autocorrelation algorithms using Rcpp
and RcppParallel
: global Moran’s I, Geary’s C and Getis-Ord’s G. To identify the spatial high-cluster areas, SVP
implements two local univariate spatial statistical algorithms using Rcpp
and RcppParallel
: local Getis-Ord’s G and local Moran’s I.
3.1.1 Identification of spatial variable features
Here, we identify the spatial variable cell-types using runDetectSVG
, which implements the global Moran’s I, Geary’s C and Getis-Ord’s G algorithms.
# First approach:
# we can first obtain the target gsvaExp with gsvaExp names via
# gsvaExp() function, for example, CellTypeFree
pdac_a_spe |> gsvaExp("CellTypeFree")
#> class: SpatialExperiment
#> dim: 20 428
#> metadata(0):
#> assays(1): affi.score
#> rownames(20): Acinar_cells Ductal_terminal_ductal_like ... RBCs
#> Fibroblasts
#> rowData names(4): exp.gene.num gset.gene.num gene.occurrence.rate
#> geneSets
#> colnames(428): Spot1 Spot2 ... Spot427 Spot428
#> colData names(3): cluster_domain sizeFactor sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : x y
#> imgData names(0):
# Then we use runDetectSVG with method = 'moran' to identify the spatial variable
# cell-types with global Moran's I.
celltype_free.I <- pdac_a_spe |> gsvaExp("CellTypeFree") |>
runDetectSVG(
assay.type = 'affi.score', # because default is 'logcounts', it should be adjused
method = 'moransi',
action = 'only'
)
#> Identifying the spatially variable gene sets (pathway) based on moransi
#> elapsed time is 0.077000 seconds
celltype_free.I |> dplyr::arrange(rank) |> head()
#> obs expect.moransi sd.moransi
#> Cancer_clone_A 0.7402441 -0.00234192 0.02807961
#> Cancer_clone_B 0.7187722 -0.00234192 0.02806609
#> Acinar_cells 0.6959057 -0.00234192 0.02788079
#> mDCs_A 0.5823319 -0.00234192 0.02760418
#> Ductal_CRISP3_high-centroacinar_like 0.5605952 -0.00234192 0.02794187
#> Endothelial_cells 0.5362277 -0.00234192 0.02757573
#> Z.moransi pvalue padj rank
#> Cancer_clone_A 26.44573 2.042568e-154 4.085135e-153 1
#> Cancer_clone_B 25.69343 6.921386e-146 6.921386e-145 2
#> Acinar_cells 25.04404 1.013736e-138 6.758238e-138 3
#> mDCs_A 21.18062 7.205485e-100 3.602743e-99 4
#> Ductal_CRISP3_high-centroacinar_like 20.14672 1.437699e-90 5.750794e-90 5
#> Endothelial_cells 19.53057 3.018138e-85 1.006046e-84 6
# Second approach:
# the result was added to the original object by specific the
# gsvaexp argument in the runDetectSVG
pdac_a_spe <- pdac_a_spe |>
runDetectSVG(
gsvaexp = 'CellTypeFree',
method = 'moran'
)
#> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'.
#> Identifying the spatially variable gene sets (pathway) based on moransi
#> elapsed time is 0.064000 seconds
#> The result is added to the input object, which can be extracted using
#> `svDf()` with type='sv.moransi' for `SingleCellExperiment` or `SpatialExperiment`.
#> If input object is `SVPExperiment`, and `gsvaexp` is specified
#> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment`
#> or `SpatialExperiment`),then also using `svDf()` to extract.
gsvaExp(pdac_a_spe, 'CellTypeFree') |> svDf("sv.moransi") |> dplyr::arrange(rank) |> head()
#> obs expect.moransi sd.moransi
#> Cancer_clone_A 0.7402441 -0.00234192 0.02807961
#> Cancer_clone_B 0.7187722 -0.00234192 0.02806609
#> Acinar_cells 0.6959057 -0.00234192 0.02788079
#> mDCs_A 0.5823319 -0.00234192 0.02760418
#> Ductal_CRISP3_high-centroacinar_like 0.5605952 -0.00234192 0.02794187
#> Endothelial_cells 0.5362277 -0.00234192 0.02757573
#> Z.moransi pvalue padj rank
#> Cancer_clone_A 26.44573 2.042568e-154 4.085135e-153 1
#> Cancer_clone_B 25.69343 6.921386e-146 6.921386e-145 2
#> Acinar_cells 25.04404 1.013736e-138 6.758238e-138 3
#> mDCs_A 21.18062 7.205485e-100 3.602743e-99 4
#> Ductal_CRISP3_high-centroacinar_like 20.14672 1.437699e-90 5.750794e-90 5
#> Endothelial_cells 19.53057 3.018138e-85 1.006046e-84 6
The obs
is the Moran’s I, expect.moransi
is the expect Moran’s I (E(I)
). Moran’s I quantifies the correlation between a single feature at a specific spatial point or region and its neighboring points or regions. Global Moran’s I usually ranges from -1 to 1, Global Moran’s I significantly above E(I)
indicate a distinct spatial pattern or spatial clustering, which occurs when neighboring captured locations tend to have similar feature value. Global Moran’s I around E(I)
suggest random spatial expression, that is, absence of spatial pattern. Global Moran’s I significantly below E(I)
imply a chessboard-like pattern or dispersion.
# using Geary's C
pdac_a_spe <- pdac_a_spe |>
runDetectSVG(
gsvaexp = 'CellTypeFree',
method = 'geary'
)
#> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'.
#> Identifying the spatially variable gene sets (pathway) based on gearysc
#> elapsed time is 0.031000 seconds
#> The result is added to the input object, which can be extracted using
#> `svDf()` with type='sv.gearysc' for `SingleCellExperiment` or `SpatialExperiment`.
#> If input object is `SVPExperiment`, and `gsvaexp` is specified
#> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment`
#> or `SpatialExperiment`),then also using `svDf()` to extract.
gsvaExp(pdac_a_spe, "CellTypeFree") |> svDf('sv.gearysc') |> dplyr::arrange(rank) |> head()
#> obs expect.gearysc sd.gearysc
#> Cancer_clone_A 0.2709051 1 0.02873561
#> Cancer_clone_B 0.2880919 1 0.02879022
#> Acinar_cells 0.3055211 1 0.02952587
#> Ductal_CRISP3_high-centroacinar_like 0.4213014 1 0.02928596
#> mDCs_A 0.4603825 1 0.03058249
#> Endothelial_cells 0.4843175 1 0.03068854
#> Z.gearysc pvalue padj rank
#> Cancer_clone_A 25.37252 2.535621e-142 5.071243e-141 1
#> Cancer_clone_B 24.72743 2.711724e-135 2.711724e-134 2
#> Acinar_cells 23.52103 1.242716e-122 8.284772e-122 3
#> Ductal_CRISP3_high-centroacinar_like 19.76027 3.272367e-87 1.636183e-86 4
#> mDCs_A 17.64466 5.592678e-70 2.237071e-69 5
#> Endothelial_cells 16.80375 1.145504e-63 3.818347e-63 6
The obs
is the Geary’s C. Global Geary’s C is another popular index of global spatial autocorrelation, but focuses more on differences between neighboring values. Global Geary’s C usually range from 0 to 2. Global Geary’s C significantly below E[C]=1
indicate a distinct spatial pattern or spatial clustering, the value significantly above E[C]=1
suggest a chessboard-like pattern or dispersion, the value around E[C]=1
imply absence of spatial pattern.
# using Getis-ord's G
pdac_a_spe <- pdac_a_spe |>
runDetectSVG(
gsvaexp = 'CellTypeFree',
method = 'getis'
)
#> The `gsvaexp` was specified, the specified `gsvaExp` will be used to detect 'svg'.
#> Identifying the spatially variable gene sets (pathway) based on getisord
#> elapsed time is 0.030000 seconds
#> The result is added to the input object, which can be extracted using
#> `svDf()` with type='sv.getisord' for `SingleCellExperiment` or `SpatialExperiment`.
#> If input object is `SVPExperiment`, and `gsvaexp` is specified
#> the result can be extracted by `gsvaExp()` (return a `SingleCellExperiment`
#> or `SpatialExperiment`),then also using `svDf()` to extract.
gsvaExp(pdac_a_spe, "CellTypeFree") |> svDf('sv.getisord') |> dplyr::arrange(rank) |> head()
#> obs expect.G sd.G
#> Cancer_clone_A 0.007214987 0.00234192 0.0001844816
#> Cancer_clone_B 0.007406175 0.00234192 0.0001975938
#> Acinar_cells 0.007109879 0.00234192 0.0001920919
#> mDCs_A 0.005853463 0.00234192 0.0001651428
#> Ductal_CRISP3_high-centroacinar_like 0.005800695 0.00234192 0.0001749367
#> Endothelial_cells 0.005749155 0.00234192 0.0001750884
#> Z.G pvalue padj rank
#> Cancer_clone_A 26.41492 4.617621e-154 9.235241e-153 1
#> Cancer_clone_B 25.62962 3.567592e-145 3.567592e-144 2
#> Acinar_cells 24.82124 2.644298e-136 1.762865e-135 3
#> mDCs_A 21.26368 1.231569e-100 6.157844e-100 4
#> Ductal_CRISP3_high-centroacinar_like 19.77158 2.615753e-87 1.046301e-86 5
#> Endothelial_cells 19.46009 1.196819e-84 3.989397e-84 6
The obs
is the Getis-Ord’s G of each cell-type. Global Getis-Ord’s G measures global spatial autocorrelation by evaluating the clustering of high/low value, specifically identifying hot spots
or cold spots
. Global Getis-Ord’s G values have no fixed range, as they depend on the specific dataset and spatial weight matrix. Global Getis-Ord’s G significantly above E(G)
indicate a distinct spatial pattern or spatial clustering, which occurs when neighboring captured locations tend to have similar feature value. Global Getis-Ord’s G around E(G)
suggest random spatial expression, that is, absence of spatial pattern. Global Getis-Ord’s G significantly below E(G)
imply a chessboard-like pattern or dispersion.
So the Cancer clone A
, Cancer clone B
, and Acinar cells
were more concentrated in the spatial distribution compare to others.
3.1.2 Identification of local spatial aggregation areas
Next, we can identify the spatial high-cluster areas for the spatial variable features via runLISA
, which provides two local spatial statistics algorithms: Local Getis-Ord’s G and Local Moran’s I
celltypefree_lisares <- pdac_a_spe |>
gsvaExp("CellTypeFree") %>%
runLISA(features=rownames(.), assay.type=1)
celltypefree_lisares$Cancer_clone_A |> head()
#> Gi E.Gi Var.Gi Z.Gi Pr (z != E(Gi))
#> Spot1 0.0001746202 0.00234192 1.649943e-06 -1.687270 0.09155141
#> Spot2 0.0003158584 0.00234192 2.996272e-06 -1.170475 0.24180992
#> Spot3 0.0002774819 0.00234192 2.489785e-06 -1.308341 0.19075761
#> Spot4 0.0001673520 0.00234192 2.489501e-06 -1.378215 0.16813699
#> Spot5 0.0004016950 0.00234192 2.994463e-06 -1.121225 0.26219219
#> Spot6 0.0004403359 0.00234192 2.130895e-06 -1.302670 0.19268729
#> cluster.no.test cluster.test
#> Spot1 Low NoSign
#> Spot2 Low NoSign
#> Spot3 Low NoSign
#> Spot4 Low NoSign
#> Spot5 Low NoSign
#> Spot6 Low NoSign
Local Getis-Ord’s G (Gi) measures the aggregation of a feature value in a specific spot and its neighbors, determining if locations are significantly clustered with high (hot spots) or low values (cold spots). Gi usually have no fixed range, a larger Gi indicates a stronger clustering of high expression for the feature in captured location and its neighboring spots, while a smaller Gi indicates a stronger clustering of low expression.
gsvaExp(pdac_a_spe, 'CellTypeFree') |>
plot_lisa_feature(
lisa.res = celltypefree_lisares[c('Cancer_clone_A', 'Cancer_clone_B', 'Acinar_cells')],
assay.type = 1,
gap_line_width = .18
)
The highlighted area signifies that the feature is significantly clustered in this region.
We also can use local Moran’s I method to identify the region.
celltypefree_lisa.i <- pdac_a_spe |>
gsvaExp("CellTypeFree") %>%
runLISA(
features = rownames(.),
assay.type = 1,
method = 'localmoran'
)
celltypefree_lisa.i$Cancer_clone_A |> head()
#> Ii E.Ii Var.Ii Z.Ii Pr (z != E(Ii))
#> Spot1 0.2411323 -0.0004394871 0.02049855 1.687270 0.09155141
#> Spot2 0.2610045 -0.0005894003 0.04994947 1.170475 0.24180992
#> Spot3 0.2907443 -0.0007044064 0.04962291 1.308341 0.19075761
#> Spot4 0.3124376 -0.0007329623 0.05163310 1.378215 0.16813699
#> Spot5 0.2792327 -0.0007358277 0.06234951 1.121225 0.26219219
#> Spot6 0.2256567 -0.0005002450 0.03014053 1.302670 0.19268729
#> cluster.no.test cluster.test
#> Spot1 Low-Low NoSign
#> Spot2 Low-Low NoSign
#> Spot3 Low-Low NoSign
#> Spot4 Low-Low NoSign
#> Spot5 Low-Low NoSign
#> Spot6 Low-Low NoSign
Local Moran’s I (Ii) measures the spatial correlation between a feature’s values in a specific area and its neighbors, assessing the similarity or difference between a location’s value and those of its surroundings. Ii also have no fixed range, a larger Ii signifies a stronger positive correlation in a region and its neighbors, with clusters of high or low values. A smaller value indicates a negative correlation, where high values are next to low values, or vice versa.
gsvaExp(pdac_a_spe, "CellTypeFree") |>
plot_lisa_feature(
lisa.res = celltypefree_lisa.i[c('Cancer_clone_A', 'Cancer_clone_B', 'Acinar_cells')],
clustertype = 'High-High',
assay.type = 1,
gap_line_width = .18
)
3.2 Bivariate spatial statistical analysis
Next, to explore the bivariate spatial co-distribution, SVP
implements global and local bivariate spatial statistics algorithms. Global bivariate spatial statistics algorithm can be used to identify whether the bivariate is co-cluster in the specific space. Local bivariate spatial algorithm is to find the areas of spatial co-clustering of bivariate.
3.2.1 Global bivariate spatial analysis
Here, we use runGLOBALBV
of SVP
to explore the spatial co-distribution between the cell-types.
gbv.res <- pdac_a_spe |>
gsvaExp('CellTypeFree') |>
runGLOBALBV(
features1 = names(markerList),
assay.type = 1,
permutation = NULL, # if permutation is NULL, mantel test will be used to calculated the pvalue
add.pvalue = TRUE,
alternative = 'greater' # one-side test
)
# gbv.res is a list contained Lee and pvalue matrix.
gbv.res |> as_tbl_df(diag=FALSE, flag.clust = TRUE) |>
dplyr::filter(grepl("Cancer", x)) |> dplyr::arrange(desc(Lee))
#> # A tibble: 35 × 4
#> x y Lee pvalue
#> <fct> <fct> <dbl> <dbl>
#> 1 Cancer_clone_A Cancer_clone_B 0.728 0
#> 2 Cancer_clone_B Fibroblasts 0.185 2.04e-27
#> 3 Cancer_clone_A Fibroblasts 0.180 5.82e-27
#> 4 Cancer_clone_B Endothelial_cells 0.000286 4.99e- 1
#> 5 Cancer_clone_A Endothelial_cells -0.0169 8.07e- 1
#> 6 Cancer_clone_B T_cells_&_NK_cells -0.0442 9.97e- 1
#> 7 Cancer_clone_A RBCs -0.0443 9.80e- 1
#> 8 Cancer_clone_B RBCs -0.0513 9.91e- 1
#> 9 Cancer_clone_A T_cells_&_NK_cells -0.0696 1.00e+ 0
#> 10 Cancer_clone_B Macrophages_A -0.0773 1.00e+ 0
#> # ℹ 25 more rows
The Lee
value typically ranges from -1 to 1. When the index is closer to 1, it indicates a strong positive spatial association between the two variables, meaning the high-value areas of one variable tend to overlap with the high-value areas of the other, and low-value areas similarly overlap, showing strong spatial consistency. Conversely, if the index is near -1, it suggests a strong negative spatial association, where the high-value areas of one variable tend to overlap with the low-value areas of the other.
We developed a function plot_heatmap_globalbv
to visualize the result of global bivariate spatial analysis
# We can also display the result of global univariate spatial analysis
moran.res <- gsvaExp(pdac_a_spe, 'CellTypeFree') |> svDf("sv.moransi")
# The result of local univariate spatial analysis also can be added
lisa.f1.res <- gsvaExp(pdac_a_spe, "CellTypeFree", withColData = TRUE) |>
cal_lisa_f1(lisa.res = celltypefree_lisares, group.by = 'cluster_domain')
plot_heatmap_globalbv(gbv.res, moran.t=moran.res, lisa.t=lisa.f1.res)
#> Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
#> resolveHJust(x$just, : semi-transparency is not supported on this device:
#> reported only once per page
We can explore the spatial co-cluster between the different type of gsvaExp
. For example, we explore the spatial co-distribution between the CellTypeFree
and CancerSEA
of pdac_a_spe
object.
# If the number of features is excessive, we recommend selecting
# those with spatial variability.
# celltypeid <- gsvaExp(pdac_a_spe, "CellTypeFree") |> svDf("sv.moransi") |>
# dplyr::filter(padj<=0.01) |> rownames()
celltypeid <- rownames(gsvaExp(pdac_a_spe, "CellTypeFree"))
cancerseaid <- gsvaExp(pdac_a_spe, "CancerSEA") |>
runDetectSVG(assay.type = 1, verbose=FALSE) |>
svDf() |>
dplyr::filter(padj <= 0.01) |>
rownames()
#> elapsed time is 0.032000 seconds
runGLOBALBV(
pdac_a_spe,
gsvaexp = c("CellTypeFree", "CancerSEA"),
gsvaexp.features = c(celltypeid, cancerseaid),
permutation = NULL,
add.pvalue = TRUE
) -> res.gbv.celltype.cancersea
#> The `gsvaexp` is specified, the specified `gsvaExp` will be
#> used to perform the analysis.
plot_heatmap_globalbv(res.gbv.celltype.cancersea)
3.2.2 Local bivariate spatial analysis
After global bivariate spatial analysis, we found the Cancer
cell and Fibroblasts
show significant spatial co-aggregation. Next, we use runLOCALBV
of SVP
to identify the co-aggregation areas.
lbv.res <- pdac_a_spe |>
gsvaExp('CellTypeFree') |>
runLOCALBV(features1='Fibroblasts', features2=c('Cancer_clone_A', "Cancer_clone_B"), assay.type=1)
# The result can be added to gsvaExp, then visualized by ggsc.
LISAsce(pdac_a_spe, lisa.res=lbv.res, gsvaexp.name='LBV.res') |>
gsvaExp("LBV.res") %>%
plot_lisa_feature(lisa.res = lbv.res, assay.type = 1)
The highlighted area denotes the region where the two variables significantly co-aggregate in space.