Configuration reference
Key components for configuring experiments
Overview
This page contains a complete library of configuration references to set up your experiment.
- Supported Parameters: Customize every detail of your experiment to align with your goals, from parameter tweaks to complex configs.
- Evaluation Metrics: Accurately measure your model's performance with a variety of pre-defined metrics available in Velvet experiments.
- Code Examples: Discover ready-to-use configuration code that can be easily copied into your own experiments.
Potential Use Cases
- Evaluating Factual Accuracy: Assess the accuracy and truthfulness of model outputs to ensure they provide reliable information.
- Evaluating JSON Outputs: Assess the structure of JSON outputs generated by models to ensure they meet the required format and standards.
- Evaluating RAG Pipelines: Test retrieval-augmented generation (RAG) pipelines for effectiveness in integrating external knowledge into model outputs.
- Evaluating OpenAI Assistants: Test the performance and effectiveness of OpenAI's assistant models in various tasks and applications.
- Preventing Hallucinations: Test strategies to reduce or eliminate hallucinations in model outputs to improve reliability.
- Ensuring Safety in LLM Applications: Conduct sandboxed evaluations for large language model applications to identify potential vulnerabilities.
- Benchmarking Language Models: Conduct performance benchmarks for various language models including latency, cost, etc to determine their strengths, weaknesses, and optimal use cases.
- Comparing Model Configurations: Determine the most suitable model for specific applications and optimize model output quality by selecting the appropriate temperature settings.
Configuration
A configuration represents an experiment that is run.
Property | Type | Required | Description | |
---|---|---|---|---|
name | string | Yes | Name of your experiment | |
description | string | No | Description of your experiment | |
providers | string[] | Provider[] | Yes | One or more LLMs to use |
prompts | string[] | Yes | One or more prompts to load | |
tests | string | Evaluation[] | Yes | List of LLM inputs and evaluation metrics OR path to a Google Sheet share link |
Provider
Provider is an object that includes the id
of the provider and an optional config
object that can be used to pass provider-specific configurations.
interface Provider {
id?: ProviderId; // e.g. "openai:gpt-4o-mini"
config?: ProviderConfig;
}
Velvet supports the following models:
openai:<model name>
- uses a specific model name (mapped automatically to chat or completion endpoint)openai:embeddings:<model name>
- uses any model name against the/v1/embeddings
endpoint
Here are the optional config
parameters:
interface ProviderConfig {
// Completion parameters
temperature?: number;
max_tokens?: number;
top_p?: number;
frequency_penalty?: number;
presence_penalty?: number;
best_of?: number;
functions?: OpenAiFunction[];
function_call?: 'none' | 'auto' | { name: string };
tools?: OpenAiTool[];
tool_choice?: 'none' | 'auto' | 'required' | { type: 'function'; function?: { name: string } };
response_format?: { type: 'json_object' | 'json_schema'; json_schema?: object };
stop?: string[];
seed?: number;
passthrough?: object;
functionToolCallbacks?: Record<
OpenAI.FunctionDefinition['name'],
(arg: string) => Promise<string>
>;
apiKey?: string;
apiKeyEnvar?: string;
apiHost?: string;
apiBaseUrl?: string;
organization?: string;
headers?: { [key: string]: string };
}
Evaluation
An evaluation represents a single set of inputs and evaluation metrics that is fed into all prompts and providers.
Property | Type | Required | Description | |||
---|---|---|---|---|---|---|
vars | Record<string, string> | string[] | any> | string | No | Key-value pairs to substitute in the prompt. If vars is a plain string, it can be used to load vars from a SQL query to your Velvet DB. |
assert | Assertion[] | No | List of evaluation checks to run on the LLM output | |||
threshold | number | No | Test will fail if the combined score of assertions is less than this number | |||
options.transform | string | No | A JavaScript snippet that runs on LLM output before any assertions |
Assertion
An assertion is an evaluation that compares the LLM output against expected values or conditions. Different types of assertions can be used to validate the output in various ways, such as checking for equality, similarity, or custom functions.
Property | Type | Required | Description |
---|---|---|---|
type | string | Yes | Type of assertion |
value | string | No | The expected value, if applicable |
threshold | number | No | The threshold value, applicable only to certain types such as similar , cost , javascript |
metric | string | No | The label for this result. Assertions with the same metric will be aggregated together |
See examples of Deterministic evaluation assertions and LLM based evaluation assertions.
Updated 6 days ago