You can use Quotient’s SDK to automatically detect hallucinations and other reliability issues in your AI outputs.

Initialize the Logger with Detections

from quotientai import QuotientAI

quotient = QuotientAI(api_key="your-quotient-api-key")

logger = quotient.logger.init(
    app_name="my-first-app",
    environment="dev",
    sample_rate=1.0,
    # this will automatically run hallucination detection on 100% of your model outputs in relation to the documents you provide
    hallucination_detection=True,
    hallucination_detection_sample_rate=1.0,
)

Send Logs with Detections Enabled

log_id = logger.log(
    user_query="What is the capital of France?",
    model_response="The path to greatness is through hard work and dedication.",
    documents=[
        "France is a country in Western Europe.",
        "Paris is the capital of France.",
    ],
)

Poll for Detections

Synchronously poll for detection results using the client:

detection = logger.poll_for_detection(log_id=log_id)

Parameters:

log_id
string
required

The log ID of the log you want to poll for detections.

timeout
int
default:"300"

The maximum time to wait for a response in seconds.

poll_interval
float
default:"2.0"

The interval between checks in seconds.

Returns:

detection
object
required

The detection results.

Detections Dashboard

Go to the Detections Dashboard to see your logs and any detected hallucinations.

Hallucinations

How do we define hallucinations?

The hallucination rate measures how often a model generates information that cannot be found in its provided inputs, such as retrieved documents, user messages, or system prompts.

Quotient reports an extrinsic hallucination rate: we determine whether the model’s output is externally unsupported by the context it was given.

How do we detect hallucinations?

  1. Break output into individual claims or sentences
  2. Compare each claim to available context, including:
    • user_query (what the user asked)
    • documents (retrieved evidence)
    • message_history (prior turns in the conversation)
  3. Flag claims that lack support in any of the above inputs as hallucinations

If a sentence cannot be traced back to any context, it’s counted as a hallucination.

Why is it important to monitor your AI system for hallucinations?

Extrinsic hallucinations are the primary failure mode in augmented AI systems. Even when retrieval succeeds, generation can drift. This metric helps teams:

  • Catch hallucinations early in development
  • Monitor output quality post-deployment
  • Guide prompt iteration and model fine-tuning

Well-grounded systems typically show < 5% hallucination rate. If yours is higher, it’s often a signal that either your data ingestion, retrieval pipeline, or prompting needs improvement.

Document Relevance

How do we define document relevance?

Document Relevance measures how well your retrieval or search system finds context that’s actually useful for answering the user’s query. Specifically, it quantifies how relevant the retrieved documents (or chunks) are to what the user asked.

A document is considered relevant if it contains information that addresses at least one part of the query. If it does not address any part, it is marked as irrelevant.

The Document Relevance score is calculated as the fraction of documents that are relevant to at least one part of the user query.

How do we measure document relevance?

  1. Compare each document (or chunk) against the full user_query.
  2. Determine whether the document contains information relevant to any part of the query:
    • If it does, mark it as relevant
    • If it doesn’t, mark it as irrelevant
  3. Compute the overall document relevance score as:relevant_documents / total_documents .

Why is it important to monitor document relevance?

Document Relevance is a core metric for evaluating search- and retrieval-augmented systems. Even if the AI model generates well, weak retrieval can negatively impact the quality of the response. This metric helps teams:

  • Assess whether retrieval is surfacing useful context
  • Debug cases where generation fails despite successful prompting
  • Improve recall/precision of retrieved results
  • Monitor drift after retriever or data changes

High-performing systems typically show > 75% document relevance. Lower scores may signal ambiguous user queries, incorrect retrieval, or noisy source data.