• Preface
    • 0.1 How This Book Is Organized
    • 0.2 Example Conventions
  • I Foundations & Core Concepts
  • 1 State-Aware Analysis Contract
    • 1.1 What is the State Contract?
    • 1.2 Macro View: Status()
    • 1.3 Digging Deeper: get_analysis_context()
    • 1.4 Manual Overrides: Layers and Default APIs
  • 2 Automation & Provenance Logs
    • 2.1 The One-Liner Pipeline
    • 2.2 The Provenance Log (CommandLog)
    • 2.3 Writing Methods for You: PipelineSummary
  • II Core Workflow
  • 3 Basic Preprocessing and Clustering
    • 3.1 Read 10X data
    • 3.2 QC
    • 3.3 Variable features
    • 3.4 Dimensional reduction
    • 3.5 Clustering
    • 3.6 UMAP
    • 3.7 Find Markers
      • 3.7.1 Marker gene information
    • 3.8 Cell cluster annotation
    • 3.9 Automatic Annotation (SingleR)
  • 4 Enhanced Visualization
    • 4.1 Loading data
    • 4.2 Dimensionality Reduction Plots
      • 4.2.1 CellDimPlot
      • 4.2.2 FeatureDimPlot
    • 4.3 Expression Heatmaps
    • 4.4 Cell Statistics
  • 5 Batch Correction
    • 5.1 Load datasets
    • 5.2 Preprocess
    • 5.3 Integration Workflow
    • 5.4 Clustering
    • 5.5 Visualization
    • 5.6 Pseudobulk Differential Expression
  • 6 Cell Type Annotation
    • 6.1 Manual annotation
    • 6.2 Automatic annotation
      • 6.2.1 Lightweight Reference Mapping (KNN)
      • 6.2.2 Run SingleR with sclet
      • 6.2.3 Run SingleR
      • 6.2.4 Annotate with SingleR result
      • 6.2.5 Comparision with manual annotation result
  • III Statistical Inference & Biological Interpretation
  • 7 Differential Expression & Pseudobulk
    • 7.1 Standard Single-Cell DE
      • 7.1.1 The “What does this gene even do?” Problem
    • 7.2 The Gold Standard: Pseudobulk Analysis
      • 7.2.1 Step 1: Aggregate Expression
      • 7.2.2 Step 2: Pseudobulk DE with DESeq2
  • 8 Functional Enrichment
    • 8.1 GO Enrichment
    • 8.2 KEGG Enrichment
  • 9 Gene Set Scoring
    • 9.1 Multi-Backend Scoring in One Click
    • 9.2 Reading and Visualizing Results
  • IV Advanced Ecosystem & Python Interoperability
  • 10 Gene Regulatory Networks (pySCENIC)
    • 10.1 Environment and Data Preparation
    • 10.2 One-Click Inference and Scoring
    • 10.3 Result Retrieval and State Awareness
  • 11 Trajectory Inference and RNA Velocity
    • 11.1 Slingshot: cell lineage and pseudotime inference
      • 11.1.1 Lineage plot
      • 11.1.2 Pseudotime plot
      • 11.1.3 Expression trends in different cell trajectories
      • 11.1.4 Heatmap
    • 11.2 RNA velocity
      • 11.2.1 Example data
      • 11.2.2 Run velocity analysis
      • 11.2.3 Visualize velocity-based trajectory
    • 11.3 CellRank: Advanced Fate Mapping
      • 11.3.1 Run CellRank
  • 12 Spatial & In-Silico Perturbation
    • 12.1 High-Resolution Spatial Deconvolution: Cell2location
    • 12.2 In-Silico Gene Perturbation: CellOracle
  • V Specialized Extensions
  • 13 Metacells with SuperCell
    • 13.1 RunSuperCell
    • 13.2 Estimate SuperCell Purity
  • 14 Cell-Cell Communication
    • 14.1 Input
    • 14.2 Running CCI
    • 14.3 Using Alternative Backends (CellPhoneDB & NicheNet)
      • 14.3.1 CellPhoneDB
      • 14.3.2 NicheNet
    • 14.4 Visualization
      • 14.4.1 Legacy CellChat Visualization
  • 15 Milo: Differential Abundance
    • 15.1 Requirements
    • 15.2 Two-group DA (GLM)
    • 15.3 Multi-contrast testing
    • 15.4 GLMM (random intercept)
    • 15.5 Refit after dropping separated neighbourhoods
  • 16 Interactive Data Exploration
    • 16.1 Launch Explorer
    • 16.2 Customization
  • 17 SVP: Gene Set Activity in Spatial and Single-Cell Data
    • 17.1 runSGSA
    • 17.2 runLISA
    • 17.3 runLOCALBV
  • VI AI, Interoperability & Outlook
  • 18 Interoperability
    • 18.1 Seurat and SCE Conversion
    • 18.2 AnnData (h5ad)
  • 19 Recommended usage
    • 19.1 For end users
    • 19.2 For pipeline authors
  • 20 Developer design notes
    • 20.1 Core object contract
    • 20.2 How to add a new analysis function
    • 20.3 Implementation rules for contributors
    • 20.4 What should be tested
  • 21 AI Copilot & Evidence Governance
    • 21.1 sclet_copilot: An AI That Understands Your Objects
    • 21.2 AuditAnalysisChain: Cross-Chain Error Control
  • References

sclet: A Lightweight Toolkit for Single-Cell Data Analysis

19 Recommended usage

sclet is still centered on SingleCellExperiment. The recommended way to use the package is to treat the SCE object as the single source of truth, and then apply sclet analysis functions on top for common analysis steps.

19.1 For end users

The shortest recommended workflow is to use the standard pipeline entry point:

library(sclet)

# Run the full standard pipeline
sce <- RunStandardPipeline(sce)

# Print a structured summary of the analysis steps
PipelineSummary(sce)

Under the hood, RunStandardPipeline sequentially calls:

sce <- NormalizeData(sce)
sce <- FindVariableFeatures(sce)
sce <- ScaleData(sce)
sce <- RunPCA(sce)
sce <- FindNeighbors(sce)
sce <- FindClusters(sce)
sce <- RunUMAP(sce)

If scrapper is installed, FindVariableFeatures(sce, method = "scrapper") is also supported as an optional HVG backend. sclet still writes the result back into the same HVG state and rowData, so downstream code continues to use VariableFeatures() in the usual way.

Because scrapper is an optional dependency, users need to install it explicitly if they want this backend instead of the default scran path.

In current sclet, user-facing analysis verbs are standardized on the Run* style. Use RunPCA(), RunMilo(), and RunCellChat() as the recommended public API.

The other important convention is that sclet now distinguishes between:

  • a physical assay, which is where the matrix is actually stored in the SCE object
  • a logical layer, which is the user-facing expression view that downstream analysis functions consume

For example, after normalization and scaling, a typical object may expose:

  • counts
  • logcounts
  • scaled

and DefaultLayer(object) tells you which one is currently recommended for downstream use.

For interactive exploration, it is better to rely on the state-aware accessors instead of reaching directly into object internals:

  • DefaultAssay(), DefaultLayer(), DefaultReduction(), DefaultGraph(), and ActiveIdent() expose the current active view.
  • Layers() and LayerData() expose the available expression layers and their matrices.
  • CommandLog() shows which analysis functions have been applied.
  • PipelineSummary() gives a clear textual report of the executed pipeline steps.
  • Status() gives a quick user-facing snapshot of the current object state.
  • get_hvg(), get_graph(), get_milo(), get_trajectory(), get_cellchat(), get_integration(), get_annotation(), and get_mapping() expose structured analysis records.
  • get_analysis_context() gives a lightweight summary of the current active view plus the active integration/annotation/mapping and other analysis records.
  • has_*() helpers are the preferred way to check whether an analysis result is available.

You can also use the integrated plotting functions that automatically consume the active state:

# Plot the active dimensional reduction colored by active identity
CellDimPlot(sce)

# Plot feature expression on the active dimensional reduction
FeatureDimPlot(sce, features = c("GeneA", "GeneB"))

# Plot a grouped heatmap of features across cell identities
GroupHeatmap(sce, features = c("GeneA", "GeneB", "GeneC"))

# Plot cell statistics (e.g., proportion of clusters across conditions)
CellStatPlot(sce, split.by = "Condition")

sclet supports robust annotation and reference mapping:

# Full annotation via SingleR
sce <- RunSingleR(sce, ref = ref_dataset, labels = ref_dataset$CellType)

# Lightweight label transfer via KNN in shared feature space
sce <- RunKNNPredict(sce, ref = ref_dataset, labels = "CellType", k = 5)

# Visualize query and reference cells in the same reduction space
ProjectionPlot(query = sce, ref = ref_dataset, reduction = "UMAP")

This means the recommended mental model is simple: keep one SCE object, run the sclet analysis functions in sequence, and inspect state through the exported accessors instead of through ad hoc metadata fields.

In practical terms:

  • use RunPCA(layer = "scaled") if you want PCA to be built explicitly from the scaled layer
  • use FindMarkers() without an explicit assay/layer if you want it to follow the current layer logic, with safe fallback away from scaled
  • use RunSingleR() and RunCellChat() without manual matrix extraction when the current DefaultLayer() already reflects the desired biological view
  • after BatchRemover(), use get_integration() and DefaultLayer() to understand which corrected view is currently active
  • after RunSingleR(), use get_annotation() to inspect the annotation record, and get_mapping() to inspect the corresponding reference-mapping record
  • use Status() when you want the shortest high-level snapshot of the current object state before drilling into specific accessors
  • use get_analysis_context() when you want one compact snapshot of the current layer/reduction/graph/ident combination together with the active analysis records
  • after RunSuperCell(), use get_supercell() to inspect the returned metacell object’s aggregation record and parent-child provenance
  • after RunCellChat(), use get_cellchat() to inspect the active communication record, or pass id = ... when you keep multiple CellChat runs
  • after RunMilo(), use get_milo() to inspect the active DA record, or pass id = ... when you keep multiple Milo runs

When analysis is performed on top of a corrected layer, downstream states now carry that provenance forward. In other words, PCA, graph construction, clustering, and UMAP can all record that they depend on the active integration state rather than behaving like disconnected one-off steps.

For multi-sample workflows, the provenance chain can now start even earlier: sce_merge() stores a merged_inputs integration record, and the subsequent BatchRemover() state can point back to that merge step.

19.2 For pipeline authors

If you are writing a reusable script or package around sclet, prefer patterns that are stable under future extension:

  • Pass the SCE object through each step and return the updated object explicitly.
  • Prefer exported entry points such as RunPCA() and RunUMAP() over calling lower-level functions directly when sclet already provides them.
  • Read results through accessors (get_*, has_*, Default*, ActiveIdent, Layers, LayerData) instead of assuming a fixed metadata layout.
  • If old metadata still needs to be read while migrating historical objects, keep that logic centralized in shared internal helpers instead of re-implementing direct metadata(...) reads in multiple accessors.
  • When using RunCellChat(), request return = "sce" or return = "both" if downstream code needs the updated SCE with recorded analysis state.
  • When a module can produce multiple records of the same type, give each run a stable name and read it back through get_* (id = ...) instead of assuming only one result exists.