2 Automation & Provenance Logs

Ever found yourself lost in parameter hell? You spend a week tweaking PCA dimensions, resolution parameters, and variable feature counts. Three months later, when it’s time to write the “Materials and Methods” section for your manuscript, you stare at your SingleCellExperiment object and think: “Wait, did I use 2000 or 3000 highly variable genes for this?”

sclet solves this pain point beautifully with two features: Automated Pipelines and Provenance Logs.

In the bookdown workflow, the examples below are executed against a packaged PBMC object rather than a user-specific raw directory. This keeps the chapter reproducible while still showing the intended day-to-day workflow.

2.1 The One-Liner Pipeline

If you just want to get from a raw count matrix to a UMAP plot as fast as possible without writing 10 lines of boilerplate code, RunStandardPipeline() is your best friend.

# In a real project you might start from:
# sce <- Read10X(data.dir = "filtered_feature_bc_matrix/")
#
# For a runnable documentation example we reuse a packaged PBMC object.
sce <- sce_pipeline_input

# Run everything from normalization to UMAP in one line
sce <- RunStandardPipeline(
    sce, 
    nfeatures = 2000, 
    dims = 1:20, 
    resolution = 0.8
)

# Boom. Ready to inspect.
sce

## class: SingleCellExperiment 
## dim: 13714 2638 
## metadata(1): sclet
## assays(3): counts logcounts scaled
## rownames(13714): AL627309.1 AP006222.2 ...
##   PNRC2_ENSG00000215700 SRSF10_ENSG00000215699
## rowData names(15): ENSEMBL_ID Symbol_TENx ... p.value
##   FDR
## colnames: NULL
## colData names(16): Sample Barcode ... label ident
## reducedDimNames(3): PCA .dimred UMAP
## mainExpName: NULL
## altExpNames(0):

Under the hood, RunStandardPipeline() sequentially executes: 1. NormalizeData() 2. FindVariableFeatures() 3. ScaleData() 4. RunPCA() 5. FindNeighbors() 6. FindClusters() 7. RunUMAP()

2.2 The Provenance Log (CommandLog)

The true magic happens after you run your analysis. Every major function in sclet (whether you use the pipeline or run them step-by-step) automatically records its exact parameters and timestamp into the object’s CommandLog.

You can view the raw log anytime:

# Peek at the raw command logs
CommandLog(sce, details = TRUE)

##                 command           timestamp
## 1         NormalizeData 2026-06-15 03:08:13
## 2  FindVariableFeatures 2026-06-15 03:08:16
## 3             ScaleData 2026-06-15 03:08:19
## 4                RunPCA 2026-06-15 03:08:31
## 5         FindNeighbors 2026-06-15 03:08:31
## 6          FindClusters 2026-06-15 03:08:31
## 7               RunUMAP 2026-06-15 03:08:37
## 8         NormalizeData 2026-06-15 03:08:39
## 9  FindVariableFeatures 2026-06-15 03:08:41
## 10            ScaleData 2026-06-15 03:08:44
## 11               RunPCA 2026-06-15 03:08:51
## 12        FindNeighbors 2026-06-15 03:08:51
## 13         FindClusters 2026-06-15 03:08:52
## 14              RunUMAP 2026-06-15 03:08:58
##                                                          params_summary
## 1                                      scale.factor=10000; assay=counts
## 2                                          nfeatures=2000; method=scran
## 3                                     features=[13714]; assay=logcounts
## 4  layer=scaled; subset_row=[2000]; exprs_values=scaled; ncomponents=50
## 5                                        dims=[10]; reduction=PCA; k=10
## 6                                                       resolution=0.88
## 7                                dims=[10]; reduction=PCA; layer=scaled
## 8                                      scale.factor=10000; assay=counts
## 9                                          nfeatures=2000; method=scran
## 10                                    features=[13714]; assay=logcounts
## 11    layer=scaled; subset_row=[0]; exprs_values=scaled; ncomponents=50
## 12                                       dims=[20]; reduction=PCA; k=10
## 13                                                       resolution=0.8
## 14                               dims=[20]; reduction=PCA; layer=scaled
##    outputs_summary       params   outputs
## 1  assay=logcounts 10000, c.... logcounts
## 2  feature_set=hvg  2000, scran       hvg
## 3     assay=scaled c("AL627....    scaled
## 4    reduction=PCA scaled, ....       PCA
## 5  graph=knn_graph 1:10, PC.... knn_graph
## 6  ident=colLabels         0.88 colLabels
## 7   reduction=UMAP 1:10, PC....      UMAP
## 8  assay=logcounts 10000, c.... logcounts
## 9  feature_set=hvg  2000, scran       hvg
## 10    assay=scaled c("AL627....    scaled
## 11   reduction=PCA scaled, ....       PCA
## 12 graph=knn_graph 1:20, PC.... knn_graph
## 13 ident=colLabels          0.8 colLabels
## 14  reduction=UMAP 1:20, PC....      UMAP

2.3 Writing Methods for You: `PipelineSummary`

Looking at a raw list of parameters is better than nothing, but it’s not exactly manuscript-ready. This is where PipelineSummary() shines. It reads the CommandLog and translates it into a human-readable English summary.

# Generate a manuscript-ready methods summary
PipelineSummary(sce)

## ========================================
## sclet Analysis Pipeline Summary
## ========================================
## 
## 1. Preprocessing & Feature Selection:
##   - [2026-06-15 03:08:13.9155] NormalizeData (scale.factor=10000; assay=counts)
##   - [2026-06-15 03:08:16.573854] FindVariableFeatures (nfeatures=2000; method=scran)
##   - [2026-06-15 03:08:19.803258] ScaleData (features=[13714]; assay=logcounts)
##   - [2026-06-15 03:08:39.839627] NormalizeData (scale.factor=10000; assay=counts)
##   - [2026-06-15 03:08:41.47693] FindVariableFeatures (nfeatures=2000; method=scran)
##   - [2026-06-15 03:08:44.539387] ScaleData (features=[13714]; assay=logcounts)
## 
## 2. Dimensional Reduction & Clustering:
##   - [2026-06-15 03:08:31.343233] RunPCA (layer=scaled; subset_row=[2000]; exprs_values=scaled; ncomponents=50)
##   - [2026-06-15 03:08:31.431235] FindNeighbors (dims=[10]; reduction=PCA; k=10)
##   - [2026-06-15 03:08:31.547643] FindClusters (resolution=0.88)
##   - [2026-06-15 03:08:37.556657] RunUMAP (dims=[10]; reduction=PCA; layer=scaled)
##   - [2026-06-15 03:08:51.655845] RunPCA (layer=scaled; subset_row=[0]; exprs_values=scaled; ncomponents=50)
##   - [2026-06-15 03:08:51.840011] FindNeighbors (dims=[20]; reduction=PCA; k=10)
##   - [2026-06-15 03:08:52.095023] FindClusters (resolution=0.8)
##   - [2026-06-15 03:08:58.603889] RunUMAP (dims=[20]; reduction=PCA; layer=scaled)
## 
## 3. Downstream Analysis:
##   - Not recorded.
## 
## ========================================

Output example: > The data was normalized using the LogNormalize method. Highly variable features were identified using the vst method (n=2000). The data was scaled and centered. Principal Component Analysis (PCA) was performed, and the first 20 dimensions were used for downstream analysis. A K-nearest neighbor (KNN) graph was constructed based on these dimensions (k.param=20). Cells were clustered using the Louvain algorithm with a resolution of 0.8. Finally, Uniform Manifold Approximation and Projection (UMAP) was run for 2D visualization.

Life is simply too short to manually write boilerplate methods sections. Let sclet do the heavy lifting for you.

2 Automation & Provenance Logs

2.1 The One-Liner Pipeline

2.2 The Provenance Log (CommandLog)

2.3 Writing Methods for You: PipelineSummary

2.3 Writing Methods for You: `PipelineSummary`