Configuration reference

Key components for configuring experiments

Overview

This page contains a complete library of configuration references to set up your experiment.

  • Supported Parameters: Customize every detail of your experiment to align with your goals, from parameter tweaks to complex configs.
  • Evaluation Metrics: Accurately measure your model's performance with a variety of pre-defined metrics available in Velvet experiments.
  • Code Examples: Discover ready-to-use configuration code that can be easily copied into your own experiments.

Potential Use Cases

  • Evaluating Factual Accuracy: Assess the accuracy and truthfulness of model outputs to ensure they provide reliable information.
  • Evaluating JSON Outputs: Assess the structure of JSON outputs generated by models to ensure they meet the required format and standards.
  • Evaluating RAG Pipelines: Test retrieval-augmented generation (RAG) pipelines for effectiveness in integrating external knowledge into model outputs.
  • Evaluating OpenAI Assistants: Test the performance and effectiveness of OpenAI's assistant models in various tasks and applications.
  • Preventing Hallucinations: Test strategies to reduce or eliminate hallucinations in model outputs to improve reliability.
  • Ensuring Safety in LLM Applications: Conduct sandboxed evaluations for large language model applications to identify potential vulnerabilities.
  • Benchmarking Language Models: Conduct performance benchmarks for various language models including latency, cost, etc to determine their strengths, weaknesses, and optimal use cases.
  • Comparing Model Configurations: Determine the most suitable model for specific applications and optimize model output quality by selecting the appropriate temperature settings.

Configuration

A configuration represents an experiment that is run.

PropertyTypeRequiredDescription
namestringYesName of your experiment
descriptionstringNoDescription of your experiment
providersstring[]Provider[]YesOne or more LLMs to use
promptsstring[]YesOne or more prompts to load
testsstringEvaluation[]YesList of LLM inputs and evaluation metrics OR path to a Google Sheet share link

Provider

Provider is an object that includes the id of the provider and an optional config object that can be used to pass provider-specific configurations.

interface Provider {
  id?: ProviderId; // e.g. "openai:gpt-4o-mini"
  config?: ProviderConfig; 
}

Velvet supports the following models:

  • openai:<model name> - uses a specific model name (mapped automatically to chat or completion endpoint)
  • openai:embeddings:<model name> - uses any model name against the /v1/embeddings endpoint

Here are the optional config parameters:

interface ProviderConfig {
  // Completion parameters
  temperature?: number;
  max_tokens?: number;
  top_p?: number;
  frequency_penalty?: number;
  presence_penalty?: number;
  best_of?: number;
  functions?: OpenAiFunction[];
  function_call?: 'none' | 'auto' | { name: string };
  tools?: OpenAiTool[];
  tool_choice?: 'none' | 'auto' | 'required' | { type: 'function'; function?: { name: string } };
  response_format?: { type: 'json_object' | 'json_schema'; json_schema?: object };
  stop?: string[];
  seed?: number;
  passthrough?: object;
  functionToolCallbacks?: Record<
    OpenAI.FunctionDefinition['name'],
    (arg: string) => Promise<string>
  >;
  apiKey?: string;
  apiKeyEnvar?: string;
  apiHost?: string;
  apiBaseUrl?: string;
  organization?: string;
  headers?: { [key: string]: string };
}

Evaluation

An evaluation represents a single set of inputs and evaluation metrics that is fed into all prompts and providers.

PropertyTypeRequiredDescription
varsRecord<string, string>string[]any>stringNoKey-value pairs to substitute in the prompt. If vars is a plain string, it can be used to load vars from a SQL query to your Velvet DB.
assertAssertion[]NoList of evaluation checks to run on the LLM output
thresholdnumberNoTest will fail if the combined score of assertions is less than this number
options.transformstringNoA JavaScript snippet that runs on LLM output before any assertions

Assertion

An assertion is an evaluation that compares the LLM output against expected values or conditions. Different types of assertions can be used to validate the output in various ways, such as checking for equality, similarity, or custom functions.

PropertyTypeRequiredDescription
typestringYesType of assertion
valuestringNoThe expected value, if applicable
thresholdnumberNoThe threshold value, applicable only to certain types such as similarcostjavascript
metricstringNoThe label for this result. Assertions with the same metric will be aggregated together

See examples of Deterministic evaluation assertions and LLM based evaluation assertions.