13 Metacells with SuperCell
As single-cell datasets continue to grow, it is often useful to replace highly redundant individual cells with a smaller number of representative metacells. This reduces computational burden, improves robustness in some downstream analyses, and can make large datasets easier to interpret.
In sclet, this workflow is exposed through RunSuperCell(), which wraps the SuperCell package while keeping the result inside a SingleCellExperiment-compatible workflow. In this chapter, we use the pbmc4k dataset introduced earlier in the batch-correction chapter.
Because SuperCell is installed in the bookdown GitHub Actions workflow, the examples in this chapter are executed during documentation build and therefore also serve as regression checks.
13.1 RunSuperCell
The RunSuperCell() function provides the main entry point for metacell construction.
## class: SingleCellExperiment
## dim: 33694 746
## metadata(1): sclet
## assays(2): counts logcounts
## rownames(33694): RP11-34P13.3 FAM138A ... AC213203.1
## FAM231B
## rowData names(9): ENSEMBL_ID Symbol_TENx ... p.value
## FDR
## colnames: NULL
## colData names(1): size
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
The output of RunSuperCell() is a SingleCellExperiment object whose columns now represent metacells rather than individual cells. From this point on, you can continue with a fairly standard downstream workflow.
post_process <- function(sce) {
sce <- NormalizeData(sce)
sce <- FindVariableFeatures(sce)
sce <- ScaleData(sce)
sce <- RunPCA(sce, subset_row = VariableFeatures(sce), layer = "scaled")
sce <- FindNeighbors(sce, dims = 1:10)
sce <- FindClusters(sce)
sce <- RunUMAP(sce)
return(sce)
}
sce_sc <- post_process(sce_sc)
library(ggsc)
sc_dim(sce_sc, reduction="UMAP") + sc_dim_geom_label()
13.2 Estimate SuperCell Purity
pbmc4k2 <- post_process(pbmc4k)
SC <- get_supercell(sce_sc, element = "object")
purity <- SuperCell::supercell_purity(colLabels(pbmc4k2), SC$membership, method = "entropy")
head(purity)## 1 2 3 4 5 6
## 0 0 0 0 0 0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.00000 0.03554 0.00000 1.03972