1 State-Aware Analysis Contract

Ever found yourself lost in “parameter hell”?

After running batch correction, you have to memorize whether your reduced dimension matrix is named fastMNN or scVI. When you move downstream to visualization, you have to manually specify reduction = "fastMNN". If you introduce trajectory inference, gene set scoring, or spatial deconvolution along the way, the scattered results wrap around your SingleCellExperiment (SCE) object like spaghetti. Hand this analysis over to a colleague—or return to it yourself three months later—and no one will remember which umap was derived from which batch-corrected matrix.

This is a common systemic flaw in transcriptomic workflows: We have the data, but we lose the context.

To solve this, sclet introduces its core design philosophy: the Analysis-State Contract.

1.1 What is the State Contract?

In sclet, the data flow adapts to your analysis pipeline, rather than forcing you to memorize key names. When you run RunIntegration(), the object remembers the “active integration state”. When you subsequently run RunUMAP(), it automatically perceives the output from the previous step.

More importantly, sclet quietly records a Directed Acyclic Graph (DAG) in the background. This is your analysis provenance (or “bloodline”).

1.2 Macro View: Status()

Let’s look at a minimal runnable example. By simply calling Status(), sclet provides a comprehensive health report of your object:

library(sclet)
# Use the PBMC object created above

# Check the current status of the object
Status(pbmc)
## $active
## $active$assay
## [1] "scaled"
## 
## $active$layer
## [1] "scaled"
## 
## $active$reduction
## [1] "UMAP"
## 
## $active$graph
## [1] "knn_graph"
## 
## $active$ident
## [1] "colLabels"
## 
## 
## $available
## $available$assays
## [1] "counts"    "logcounts" "scaled"   
## 
## $available$layers
## [1] "counts"    "logcounts" "scaled"   
## 
## $available$reductions
## [1] "PCA"     ".dimred" "UMAP"   
## 
## $available$graphs
## [1] "knn_graph"
## 
## $available$analyses
## [1] "hvg"
## 
## 
## $n_commands
## [1] 7
## 
## $last_command
## [1] "RunUMAP"

Whether you are using native R functions or invoking top-tier Python algorithms via the basilisk sandbox, the traces they leave are unified and clearly displayed.

1.3 Digging Deeper: get_analysis_context()

If you need to inspect the details of a specific state, sclet provides a family of get_*() and has_*() macros.

# Get the full DAG provenance tree
# This highly structured context is perfect for feeding into an AI Copilot
ctx <- get_analysis_context(pbmc)

# List all recorded advanced analysis records
names(ctx$records)
## character(0)

Bottom Line: Stop wrestling with reducedDims and colData variable names. Go with the flow, let the sclet state machine remember the details, and focus on the actual biology.

1.4 Named State Records and Typed Accessors

Since version 1.0.0, the state contract has been steadily tightened around two conventions that make downstream code more predictable.

First, analysis functions that produce state records now accept a name parameter. RunSCENIC(), RunVelocity(), RunSingleR(), and RunCellRank() all let you tag a run with a stable id, and then retrieve it later through get_*(object, id = "your_name"). This matters when you keep multiple runs of the same type in one object — for example, two SCENIC runs with different database inputs, or a velocity run alongside a separate CellRank run.

Second, the get_*() accessors are gradually being unified over typed state records rather than ad hoc metadata slots. get_geneset_scoring() now transparently reads both the newer typed geneset_scoring records and the older analysis-layer format. This means migration is backward-compatible: legacy objects still work, but new code writes into a schema that accessors, plotting functions, and the AI copilot all agree on.

The practical rule for users is simple:

  • Use has_*(object) before calling get_*(object) when you want a boolean guard.
  • Pass id = ... when you keep multiple records of the same type.
  • Trust Status(object) for the high-level snapshot; drill into get_analysis_context(object) only when you need the structured payload.

1.5 Memory Safety and Sparse Data

Single-cell count matrices are sparse by nature, and forcing them into dense representations is one of the fastest ways to blow up memory on large datasets. sclet has been audited to keep sparse representations as long as possible across the codebase.

Concretely:

  • RunIntegration(method = "scVI") converts sparse R matrices directly to scipy.sparse.csr_matrix via reticulate instead of materializing a dense block first.
  • RunSpatialDeconvolution(), RunSCENIC(), RunCellRank(), RunInSilicoPerturbation(), and RunKNNPredict() all keep sparse inputs sparse through the Python bridge.
  • pseudo_heatmap() and genecurve_plot() avoid a full-assay transpose unless only the requested features are needed.

None of this changes the user-facing API. But on datasets with hundreds of thousands of cells, it is the difference between a workflow that fits in memory and one that does not.

1.6 Manual Overrides: Layers and Default APIs

While the state machine handles 99% of routing automatically, there are times when you want manual control. For example, maybe you want to switch the default visualization from UMAP back to t-SNE, or you want to temporarily access the raw, uncorrected counts instead of the integrated layer.

sclet provides a suite of intuitive setter/getter macros:

# Check available layers
Layers(pbmc)
## [1] "counts"    "logcounts" "scaled"
# Get the data for a specific layer
raw_data <- LayerData(pbmc, layer = DefaultLayer(pbmc))
dim(raw_data)
## [1] 13714  2638
# Change the default reduction used by DimPlot, if available
if ("PCA" %in% SingleCellExperiment::reducedDimNames(pbmc)) {
    DefaultReduction(pbmc) <- "PCA"
}
DefaultReduction(pbmc)
## [1] "PCA"
# Change the default assay to a concrete assay stored in the object
if ("logcounts" %in% SummarizedExperiment::assayNames(pbmc)) {
    DefaultAssay(pbmc) <- "logcounts"
}
DefaultAssay(pbmc)
## [1] "logcounts"

These macros mimic the Seurat API perfectly, ensuring that users migrating to SingleCellExperiment feel right at home while retaining the robust Bioconductor data structure under the hood.