Changelog
All notable changes to this project will be documented in this file.
Weekly Changelog (2024-12-20)
π New Features
-
π§ββοΈ AutoJudge for Custom LLM Evaluation: Weβve introduced
AutoJudge
to thejudges
library! Given a labeled dataset with feedback, and a natural language description of an evaluation task,AutoJudge
creates custom, task-specific LLM judges. Check out the README!. -
Expanded Model Support in
judges
: Added support for open-source models via LiteLLM, enabling evaluation with LLMs on third-party inference providers. View configuration options in the LiteLLM integration guide. -
Onboarding Flow: A streamlined onboarding experience is now live, making it easier to get started and navigate the platform. Follow the suggested steps to generate, improve, and version your prompts with Quotient IQ.
β¨ Improvements
- Increased Input Size Limits: The input character limit for prompts has been increased from 8,000 to 40,000 characters (approximately 30,000 tokens) in PromptLab.
- Error Messaging for Exceeded Context Window: We added detailed error messages when prompt input exceeds the supported context window of a selected model, so you know how many tokens youβre able to utilize for a model.
π Bug Fixes
- Prompt Evaluation Persistence: Fixed a bug where prompts were not saved correctly in PromptLab, resulting in missing versions.
- Python SDK Updates Fixed a bug in
datasets.list()
that was preventing rows being displayed β now you can use theinclude_rows=True
param to view all dataset rows.
π£ Notes
- Ensure your SDK is updated to the latest version (
pip install -U quotient-python
) to leverage the new features and fixes. - Working with LLM-as-a-Judge for domain-specific tasks? Weβd love to collaborate on research & prompts to include in
judges
!