Preference tuning

For conceptual background on preference tuning, including when to use it and how it compares to other methods, see Preference tuning (DPO).

Upload a preference dataset

Preference datasets must follow a specific schema with a prompt, a chosen response, and a rejected response. SeekrFlow's data engine does not currently generate preference datasets — you must prepare them externally.

Dataset schema

{
  "prompt": [
    {
      "role": "system",
      "content": "You are an assistant helping a financial analyst understand investment risk."
    },
    {
      "role": "user",
      "content": "Explain whether investing in a single startup is high or low risk."
    }
  ],
  "chosen": [
    {
      "role": "assistant",
      "content": "Investing in a single startup is generally considered high risk..."
    }
  ],
  "rejected": [
    {
      "role": "assistant",
      "content": "It depends on the startup..."
    }
  ]
}

Upload with the preference-fine-tune file purpose. Datasets must be in JSONL or Parquet format.

from seekrai import SeekrFlow

client = SeekrFlow()

# Single file upload
upload_resp = client.files.upload(
    "dpo-dataset.parquet",
    purpose="preference-fine-tune",
)

# Bulk file upload
bulk_resp = client.files.bulk_upload(
    ["dpo-dataset1.parquet", "dpo-dataset2.parquet"],
    purpose="preference-fine-tune",
)

SeekrFlow validates the schema on upload. Uploads fail if the file format is not JSONL or Parquet, or if the schema does not match the expected preference dataset structure.

Create a preference tuning job

Set fine_tune_type to FineTuneType.PREFERENCE in the training configuration. Preference tuning supports an optional beta parameter that controls the KL-divergence penalty — how far the tuned model can deviate from the base model. Values range from 0.0 to 1.0, with 0.0 as the default.

from seekrai.types.finetune import FineTuneType
from seekrai.types import TrainingConfig, InfrastructureConfig
from seekrai import SeekrFlow

client = SeekrFlow()

training_config = TrainingConfig(
    training_files=["<your-preference-fine-tuning-file-id>"],
    model="meta-llama/Llama-3.2-1B",
    n_epochs=1,
    n_checkpoints=1,
    batch_size=8,
    learning_rate=1e-5,
    experiment_name="dpo-fine-tune-job",
    fine_tune_type=FineTuneType.PREFERENCE,
    beta=0.5,
)

infrastructure_config = InfrastructureConfig(
    accel_type="MI300X",
    n_accel=8,
)

fine_tune = client.fine_tuning.create(
    training_config=training_config,
    infrastructure_config=infrastructure_config,
    project_id=123,
)

print(fine_tune.id)

Preference tuning works with all base models in SeekrFlow. The remaining steps for monitoring, deployment, and inference are the same as other fine-tuning methods. See Create a fine-tuning job for the full workflow.