Generate datasets
Export logs for fine-tuning or batch jobs.
Create a data set
OpenAI announced the ability to fine-tune gpt-4o-mini
. Use our AI SQL editor to create a fine-tuning dataset you can use for fine-tuning OpenAI. Use a similar process to generate batch datasets for evaluations, classification, or embeddings.
Example use cases:
- Export datasets for fine-tuning models
- Select records for evaluations of different models
- Identify example sets for any batch process
Query your data set
Navigate to the AI SQL editor in your Velvet app. Write SQL to gather the relevant dataset.
Here is example SQL query for identifying logs into a single column.
-- CTE to gather all messages with appropriate roles and contents
WITH messages AS (
SELECT
id,
created_at,
jsonb_build_object('role', 'system', 'content', request->'body'->'messages'->0->>'content') AS message
FROM llm_logs
UNION ALL
SELECT
id,
created_at,
jsonb_build_object('role', 'user', 'content', request->'body'->'messages'->1->>'content') AS message
FROM llm_logs
UNION ALL
SELECT
id,
created_at,
jsonb_build_object('role', 'assistant', 'content', response->'body'->'choices'->0->'message'->>'content') AS message
FROM llm_logs
)
-- Aggregate messages for each log entry
SELECT
jsonb_build_object(
'messages', jsonb_agg(message ORDER BY id, created_at)
) AS transformed_log
FROM messages
GROUP BY id, created_at
ORDER BY created_at DESC
LIMIT 10; -- modify as needed
-- CTE to create a common filter
WITH filtered_logs AS (
SELECT *
FROM llm_logs
WHERE request->>'url' = '/chat/completions'
),
-- CTE to gather all messages with appropriate roles and contents
WITH messages AS (
SELECT
id,
created_at,
jsonb_build_object('role', 'system', 'content', request->'body'->'messages'->0->>'content') AS message
FROM filtered_logs
UNION ALL
SELECT
id,
created_at,
jsonb_build_object('role', 'user', 'content', request->'body'->'messages'->1->>'content') AS message
FROM filtered_logs
UNION ALL
SELECT
id,
created_at,
jsonb_build_object('role', 'assistant', 'content', response->'body'->'choices'->0->'message'->>'content') AS message
FROM filtered_logs
)
-- Aggregate messages for each log entry
SELECT
jsonb_build_object(
'messages', jsonb_agg(message ORDER BY id, created_at)
) AS transformed_log
FROM messages
GROUP BY id, created_at
ORDER BY created_at DESC
LIMIT 10; -- modify as needed
Export as JSONL
JSONL
OpenAI's fine-tuning and batch endpoints need the dataset formatted as JSONL .
Once you've run the SQL query and have a result set you want to use, click the Export
button in the data table. Export just selected rows, or the entire result set. Click JSONL
for the export type.
You'll now have a .jsonl
to upload to OpenAI.
Summary of the process
Refer to the OpenAI fine-tuning and batch guides for full requirements.
- Query logs using the AI SQL editor.
- Export the dataset as JSONL (or CSV if you want to further modify).
- To make manual modifications, open the CSV in a spreadsheet application. Make adjustments and export as JSONL once complete.
- Upload file to OpenAI for fine-tuning or batching.
Updated 2 months ago