Guides
Running Evaluations with Quotient
Evaluations
Evaluations in Quotient track model performance through Run
objects. Each Run combines a prompt, dataset, model, and metrics to produce quantitative results.
Runs & Results
A Run contains:
The status
field can be not-started
, running
, completed
or failed
.
Each result within a Run is represented by:
Creating a Run
To create a run, you can use the quotient.evaluate()
method along:
Parameters vary by model. See Models for provider-specific options.
Retrieving Runs
Get a specific run:
List all runs:
Run Summary
Generate performance summaries using the summarize()
method:
The summary includes:
- Aggregate metrics (average, standard deviation)
- Best / worst performing examples
- Run metadata (model, parameters, timestamps)
CLI Usage
You can run evaluations via the Quotient CLI, so long as your file has the phrase evaluate
in it:
Runs execute asynchronously and may take time for large datasets. Monitor progress with the CLI or SDK.
See also: