21 AI Copilot & Evidence Governance
You ran the pipeline and generated the plots, but are you absolutely certain your results are “correct”?
The most terrifying aspect of single-cell analysis isn’t encountering an error; it’s the silent propagation of systemic biases. If quality control thresholds for mitochondrial genes are too relaxed upstream, or if a batch correction algorithm (like fastMNN or scVI) over-aligns cell populations that possess genuine biological differences, these “false positive” signals will cascade down through dimensionality reduction and clustering, eventually landing squarely in your FindMarkers results.
When you draft a manuscript based on these markers, you have already fallen into the trap of Transcriptomic Overload.
sclet is not just a tool library; it is your intelligent research foundation. Powered by the rigorous Analysis-State Machine (Provenance DAG) and the aisdk large language model (LLM) framework, we introduce a paradigm-shifting feature: sclet_copilot and cross-chain analysis auditing.
The examples in this chapter are intentionally not executed during book rendering. AI-assisted workflows depend on user-side model configuration and, by design, should be run explicitly rather than silently triggered during documentation build.
21.1 sclet_copilot: An AI That Understands Your Objects
Unlike shallow prompt-chaining tools that merely concatenate strings for an LLM, sclet_copilot genuinely reads and comprehends the complete operational bloodline of your SingleCellExperiment object.
To reproduce these examples locally, you need aisdk installed and either a default model configured via aisdk::get_model() or a compatible fallback model configured in the environment.
library(sclet)
# Assume 'sce' has just completed Integration, UMAP, and Clustering.
# We can ask the AI Copilot directly:
sclet_copilot(sce, "I just finished scVI batch correction and clustering. Based on my records, please evaluate if the parameter settings are reasonable. Is there a potential risk of over-alignment?")What happens behind the scenes?
sclet instantly extracts the current cell count, gene count, active states, and most importantly, the Analysis Provenance (e.g., scVI integration (batch=Donor) -> Louvain clustering). It feeds this highly structured context to the LLM (like DeepSeek or GPT-4). The AI isn’t guessing blindly; it is diagnosing your data by reading its complete medical history.
21.2 AuditAnalysisChain: Cross-Chain Error Control
Suppose you use RunDEtest and identify a fantastic set of marker genes. Before jumping to conclusions, you want to perform a “cross-examination.”
It shouldn’t be hard. Call AuditAnalysisChain:
# Suppose 'top_markers' are 5 candidate DE genes you just identified
top_markers <- c("CD3D", "NKG7", "MS4A1", "LYZ", "GNLY")
# Ask the AI to backtrack: Is the expression of these genes distorted by the algorithm
# when comparing the raw counts to the integrated layer?
report <- AuditAnalysisChain(
sce,
features = top_markers,
raw_layer = "counts",
integrated_layer = "corrected"
)
cat(report)How will the AI respond? By tracing the DAG state machine and applying R.A. Fisher’s “design foresight” alongside G.E.P. Box’s “model humility,” the AI calculates a State-Dependency Confidence Score for this gene set.
This type of audit is most meaningful when the active integration method exposes a corrected expression layer, such as fastMNN. If your workflow only produced a corrected reduction (for example Harmony or scVI), the expression-level comparison shown here is no longer the right abstraction and should be adapted accordingly.
For example, it might warn you: “MS4A1 shows no significant difference in the raw counts layer. Its apparent upregulation is an artifact caused by fastMNN forcibly aligning Donor 1 and Donor 2. The confidence score for this gene is Low; please exercise caution before using it as a biological marker.”
Bottom Line:
Stop placing blind faith in P-values. Let sclet_copilot audit your evidence chain, so you can focus on publishing robust, reproducible science.