We have hosted the application evals in order to run this application in our online workstations with Wine or directly.
Quick description about evals:
The openai/evals repository is a framework and registry for evaluating large language models and systems built with LLMs. It’s designed to let you define “evals” (evaluation tasks) in a structured way and run them against different models or agents, with the ability to score, compare, and analyze results. The framework supports templated YAML eval definitions, solver-based evaluations, custom metrics, and composition of multi-step evaluations. It includes utilities and APIs to plug in completion functions, manage prompts, wrap retries or error handling, and register new evaluation types. It also maintains a growing registry of standard benchmarks or “evals” that users can reuse (for example, tasks measuring reasoning, factual accuracy, or chain-of-thought capabilities). The design is modular so you can extend or compose new evals, integrate with your own model APIs, and capture rich metadata about each run (prompt, responses, metrics).Features:
- Definition of evals via YAML plus helper classes and templated scaffolding
- Support for “solver” style evaluations (models that solve tasks) as a subclass of evals
- APIs and wrappers to plug in completion functions, prompt templates, and retry logic
- A registry of benchmark tasks and reuse of shared evals across models
- Modular, extensible architecture for adding new evaluation types
- Metadata tracking (prompts, metrics, responses, diagnostics) for analysis
Programming Language: Python.
Categories:
©2024. Winfy. All Rights Reserved.
By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.