Skip to main content
Reinforcement Fine-Tuning (RFT) allows you to train custom models tailored to your specific application using data from your production traces. By leveraging reinforcement learning techniques, RFT creates models that better understand your use case and improve performance on your specific tasks.

Why Use RFT?

RFT is designed for teams who want to:
  • Improve accuracy: Train models that perform better on your specific domain and use cases
  • Reduce costs: Use smaller, fine-tuned models that outperform larger general-purpose models
  • Customize behavior: Create models that follow your application’s patterns and requirements
  • Leverage your data: Turn your production traces into training data automatically
RFT Overview

Prerequisites

Before starting an RFT training job, ensure you have:
  1. An active Quotient account
  2. Trace data from your application using the Quotient Python SDK
  3. Sufficient traces in your chosen app and environment (minimum of 50 traces required)
RFT uses all traces sent to Quotient for the selected app and environment. Our system automatically filters and augments your traces to create high-quality training data that best represents your desired use case.

Creating a Training Job

1

Navigate to RFT

Go to the Steering section in the sidebar and click on RFT to access the training interface.
2

Select Your App

Choose the application you want to train a model for from the App dropdown. This determines which trace data will be used for training.
3

Select Environment

Select the environment (e.g., production, staging, dev) from which to extract training data. We recommend using production data for the best results.
4

Choose a Base Model

Select the foundation model to fine-tune. Available options include:
  • Qwen 3 14B - A powerful open-source model ideal for complex reasoning tasks
  • OpenAI o4-mini - A fast, efficient model from OpenAI
Choose a base model that aligns with your latency and accuracy requirements. Smaller models are faster but may sacrifice some capability.
5

Start Training

Click Start Training to begin the fine-tuning process. You’ll be redirected to the training run view where you can monitor progress.

Training Workflow

Once you start a training job, it progresses through several stages:

1. Data Extraction

The system extracts and prepares training data from your traces. During this phase:
  • All traces for the selected app and environment are collected
  • Our system filters and augments your traces to create high-quality training examples
  • A baseline performance metric is calculated
Status indicators:
  • Pending Extraction - Preparing to extract data
  • Extracting Data - Actively processing your traces

2. Training

After data extraction, the actual model training begins:
  • The base model is fine-tuned using reinforcement learning techniques
  • The model learns from your traced interactions to better match your use case
  • Progress is tracked and displayed as a percentage
Status indicator: Training in Progress

3. Completion

When training finishes:
  • Final performance metrics are calculated
  • Your fine-tuned model is deployed and ready to use
  • An inference endpoint is generated for API access
Status indicator: Training Complete

Monitoring Your Training

Progress Tracking

The training interface provides real-time updates on your job:
  • Progress bar: Shows completion percentage (0-100%)
  • Status badge: Displays current training stage
  • Auto-refresh: Updates automatically every few seconds while training

Training Metrics

For supported model types, you can view detailed training metrics:
  • Training reward: Shows how the model’s performance improves over training steps
  • Step progress: Displays the current training step out of total steps
  • Performance graph: Visualizes reward progression throughout training
The training graph shows how your model improves over time. A healthy training run typically shows:
  • Upward trend: Reward increases as training progresses
  • Stabilization: Metrics level off as the model converges
  • Consistent improvement: Steady gains without major fluctuations
If you see erratic behavior or decreasing rewards, your training data may need improvement.

Using Your Fine-Tuned Model

Once training is complete, you can use your custom model via the inference endpoint provided.

Get Your Model Endpoint

After training completes, your model’s inference endpoint is displayed on the training run page. Copy this endpoint to use in your application.

Make Inference Calls

Your fine-tuned model is compatible with the OpenAI API format. You can use either the standard OpenAI client or the OpenAI Agents SDK.

OpenAI Client

Use the OpenAI Python client with your custom model endpoint:
openai_client.py
from openai import OpenAI

client = OpenAI(
    base_url="YOUR_MODEL_ENDPOINT/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="your-model-name",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

OpenAI Agents SDK

You can also use the OpenAI Agents SDK for more advanced agent workflows:
openai_agents.py
from agents import Agent

agent = Agent(
    model="your-model-name",
    base_url="YOUR_MODEL_ENDPOINT/v1",
    api_key="YOUR_API_KEY"
)

result = agent.run("Your task here")
print(result)
Replace YOUR_MODEL_ENDPOINT with the inference endpoint shown on your completed training run page, and YOUR_API_KEY with your Quotient API key.

Performance Comparison

After training completes, the interface displays:
  • Before Training: Baseline performance on your test data
  • After Training: Performance of your fine-tuned model
  • Improvement: The percentage improvement achieved through fine-tuning
Use these metrics to validate that your fine-tuned model meets your requirements before deploying to production.

Training Run History

All your training runs are saved and accessible from the main RFT page. The training run table shows:
ColumnDescription
App NameThe application the model was trained for
EnvironmentThe environment used for training data
Base ModelThe foundation model that was fine-tuned
StatusCurrent status of the training job
Started AtWhen the training job was created
CompletedWhen training finished (if applicable)
ImprovementPerformance improvement percentage
Click View on any run to see detailed information and metrics.

Best Practices

More Data, Better Results

Send all your traces to Quotient. Our system automatically filters and augments your data to create high-quality training examples.

Use Production Data

Train on production environment data when possible, as it best represents real-world usage patterns.

Monitor Progress

Keep an eye on training metrics to ensure your model is learning effectively.

Test Before Deploy

Compare before/after performance metrics and test your model thoroughly before production use.

Troubleshooting

This usually indicates insufficient trace data. Ensure you have:
  • At least 50 traces for the selected app and environment
  • Traces that include user queries and model outputs
Try collecting more traces and starting a new training run.
If training fails after data extraction:
  • Check if the error message provides specific guidance
  • Ensure your data doesn’t contain problematic content
  • Try again with a different base model
Contact support if the issue persists.
If your model shows minimal improvement:
  • Ensure your traces contain diverse examples of your desired use case
  • Consider collecting more traces before retraining
  • Try a different base model that may be better suited to your use case
If your fine-tuned model doesn’t match expectations:
  • Verify you’re using the correct inference endpoint
  • Review your training data for any patterns that might cause unexpected behavior
  • Test with prompts similar to your traced interactions
  • Consider additional training with more targeted examples

Getting Help

If you encounter issues or have questions about RFT: