Benchmark Agent — benchmark_agent • aisdk

Run a benchmark suite against an agent and collect performance metrics.

Usage

benchmark_agent(agent, tasks, tools = NULL, verbose = TRUE)

Arguments

agent: An Agent object or model string.
tasks: A list of benchmark tasks (see details).
tools: Optional list of tools for the agent.
verbose: Print progress.

Value

A benchmark result object with metrics.

Details

Each task in the tasks list should have:

prompt: The task prompt
expected: Expected output or criteria
category: Optional category for grouping
ground_truth: Optional ground truth for hallucination checking