Fine-tuning

For conceptual background on fine-tuning methods, training requirements, and when to use each approach, see Fine-tuning.

Projects

Fine-tuning work is organized into projects. Each project groups related fine-tuning jobs under a shared goal, such as training a customer support model or a compliance classifier.

From the Fine-tuning page, you can create new projects, view recent activity, and access the full project directory.

Create a fine-tuning job

Each project contains one or more fine-tuning jobs. The job creation wizard walks through the following steps:

Job details — Name and describe the job.
Select data — Choose an existing training dataset or upload a new file.
Select model — Choose a base model for fine-tuning.
Hardware configuration — Select compute resources.
Hyperparameters — Configure training parameters including epochs, batch size, learning rate, and sequence length.
Review and start — Confirm settings and launch the job.

Fine-tuning methods

The job creation wizard supports the following fine-tuning methods:

Instruction tuning — Train on question-and-answer pairs. Upload datasets with fine-tune file purpose.
Reinforcement tuning (GRPO) — Train with reward-based optimization against reference answers. Upload datasets with reinforcement-fine-tune file purpose.

The data selection step filters available files based on the selected fine-tuning method.

Monitor training progress

Each job has a summary page that displays:

Training loss chart — Tracks loss over steps and epochs to visualize learning progress.
Job details — Status, timestamps, and configuration summary.
Data sources — Link to the training file and training prompt.
Event timeline — Real-time feed of job events (queued, running, completed).

Deploy a fine-tuned model

After a job completes, deploy the resulting model for inference directly from the job summary page. Deployment details — including status, deployment ID, and history — appear in the job summary.

Demote a deployment

Active deployments can be demoted (undeployed) when no longer needed, freeing infrastructure resources.