Caching

Below are helpful examples for querying cached logs. Run these queries in Velvet's AI SQL editor, or any other tool you're comfortable with.

Cached logs

Caching unlocks additional metadata stored with each log. If the velvet-cache-enabled header is set, the gateway will respond with a velvet-cache-status header.

velvet-cache-status will be one of HIT, MISS, NONE/UNKNOWN

Example SQL queries

Show the difference in price between cached and not cached requests.

SELECT 
  (metadata->'cache'->>'enabled')::boolean AS cache_enabled,
  SUM((metadata->'cost'->>'input_cost')::numeric) AS total_input_cost,
  SUM((metadata->'cost'->>'output_cost')::numeric) AS total_output_cost,
  SUM((metadata->'cost'->>'total_cost')::numeric) AS total_cost
FROM llm_logs
GROUP BY cache_enabled
ORDER BY cache_enabled DESC;

Break down expected vs. actual token costs.

SELECT
  COALESCE(metadata->'usage'->>'model', 'unknown') AS model,
  SUM((metadata->'cost'->>'total_cost')::numeric) AS actual_total_cost,
  SUM((metadata->'expected_cost'->>'total_cost')::numeric) AS expected_total_cost
FROM llm_logs
WHERE metadata->'usage'->>'model' IS NOT NULL
GROUP BY model
ORDER BY model ASC;

Log metadata

Caching unlocks additional metadata stored with each log. Refer to this example when querying cached requests.

{
  "cache": {
    "key": "4b2af868add63c97308b3133062aed384afb1be7fd81f225da3b8d113d8af086",
    "value": "log_gz42yh5ecgd2e22q",
    "status": "HIT",
    "enabled": true
  },
  "model": "gpt-4o-2024-05-13",
  "stream": false,
  "cost": {
    "input_cost": 0,
    "total_cost": 0,
    "output_cost": 0,
    "input_cost_cents": 0,
    "total_cost_cents": 0,
    "output_cost_cents": 0
  },
  "usage": {
    "model": "gpt-4o-2024-05-13",
    "total_tokens": 0,
    "calculated_by": "js-tiktoken",
    "prompt_tokens": 0,
    "completion_tokens": 0
  },
  "expected_cost": {
    "input_cost": 0.00585,
    "total_cost": 0.00669,
    "output_cost": 0.00084,
    "input_cost_cents": 0.585,
    "total_cost_cents": 0.669,
    "output_cost_cents": 0.084
  },
  "expected_usage": {
    "model": "gpt-4o-2024-05-13",
    "total_tokens": 1226,
    "calculated_by": "openai",
    "prompt_tokens": 1170,
    "completion_tokens": 56
  },
}