Evaluations

Run experiments and continuous testing on logs

LLMs are inherently unpredictable, which can make feature-development challenging. With Velvet Evaluations, you can feel confident that your LLM-powered features work the way you expect them to.

Test your inputs against models, settings, and metrics.


Use cases

Experiments can be set up via UI or API. Read configuration docs for further instructions.

  • Run experiments on historical subsets of logs
  • Run continuous testing on future samples of logs

Set up an evaluation


Video tutorials