Evaluations

Evaluations in Quotient track model performance through Run objects. Each Run combines a prompt, dataset, model, and metrics to produce quantitative results.

Runs & Results

A Run contains:

class Run:
    id: str                     # Unique identifier
    prompt: str                 # Reference to prompt ID
    dataset: str                # Reference to dataset ID
    model: str                  # Reference to model ID
    parameters: dict            # Model configuration
    metrics: List[str]          # Metrics to compute
    status: str                 # Current run status
    results: List[RunResult]    # Results per dataset row
    created_at: datetime        # Creation timestamp
    finished_at: datetime       # Completion timestamp

The status field can be not-started, running, completed or failed.

Each result within a Run is represented by:

class RunResult:
    id: str           # Result identifier
    input: str        # Input text
    output: str       # Model output
    values: dict      # Metric scores
    context: str      # Optional context
    expected: str     # Optional expected output
    created_at: datetime
    created_by: str

Creating a Run

To create a run, you can use the quotient.evaluate() method along:

run = quotient.evaluate(
    prompt=prompt,      # Prompt object
    dataset=dataset,    # Dataset object
    model=model,        # Model object
    parameters={
        "temperature": 0.7,
        "max_tokens": 100
    },
    metrics=['bertscore', 'exactmatch']
)
Parameters vary by model. See Models for provider-specific options.

Retrieving Runs

Get a specific run:

run = quotient.runs.get(run_id)

List all runs:

runs = quotient.runs.list()

Run Summary

Generate performance summaries using the summarize() method:

summary = run.summarize(
    best_n=3,    # Top performing examples
    worst_n=3    # Lowest performing examples
)

The summary includes:

  • Aggregate metrics (average, standard deviation)
  • Best / worst performing examples
  • Run metadata (model, parameters, timestamps)

CLI Usage

You can run evaluations via the Quotient CLI, so long as your file has the phrase evaluate in it:

quotient run simple_evaluate.py

Runs execute asynchronously and may take time for large datasets. Monitor progress with the CLI or SDK.

See also: