> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Preference tuning

> Fine-tune models on preference data using direct preference optimization (DPO) with the SeekrFlow Python SDK.

For conceptual background on preference tuning, including when to use it and how it compares to other methods, see [Preference tuning (DPO)](/flow/components/fine-tuning/preference-tuning).

## Upload a preference dataset

Preference datasets must follow a specific schema with a prompt, a chosen response, and a rejected response. SeekrFlow's data engine does not currently generate preference datasets — you must prepare them externally.

### Dataset schema

Each record requires three fields: `messages` (the prompt context), `chosen` (the preferred response), and `rejected` (the dispreferred response). See [Upload file](/flow/reference/file_upload_v1_flow_files_put) for the full schema reference.

<CodeGroup>
  ```json JSON expandable theme={null}
  {
    "messages": [
      {
        "role": "system",
        "content": "You are an assistant helping a financial analyst understand investment risk."
      },
      {
        "role": "user",
        "content": "Explain whether investing in a single startup is high or low risk."
      }
    ],
    "chosen": [
      {
        "role": "assistant",
        "content": "Investing in a single startup is generally considered high risk..."
      }
    ],
    "rejected": [
      {
        "role": "assistant",
        "content": "It depends on the startup..."
      }
    ]
  }
  ```
</CodeGroup>

Upload with the `preference-fine-tune` file purpose. Datasets must be in JSONL or Parquet format.

<CodeGroup>
  ```python Python theme={null}
  from seekrai import SeekrFlow

  client = SeekrFlow()

  # Single file upload
  upload_resp = client.files.upload(
      "dpo-dataset.parquet",
      purpose="preference-fine-tune",
  )

  # Bulk file upload
  bulk_resp = client.files.bulk_upload(
      ["dpo-dataset1.parquet", "dpo-dataset2.parquet"],
      purpose="preference-fine-tune",
  )
  ```
</CodeGroup>

SeekrFlow validates the schema on upload. Uploads fail if the file format is not JSONL or Parquet, or if the schema does not match the expected preference dataset structure.

## Create a preference tuning job

Set `fine_tune_type` to `FineTuneType.PREFERENCE` in the training configuration. Preference tuning supports an optional `beta` parameter that controls the KL-divergence penalty — how far the tuned model can deviate from the base model. Values range from `0.0` to `1.0`, with `0.0` as the default.

<CodeGroup>
  ```python Python expandable theme={null}
  from seekrai.types.finetune import FineTuneType
  from seekrai.types import TrainingConfig, InfrastructureConfig
  from seekrai import SeekrFlow

  client = SeekrFlow()

  training_config = TrainingConfig(
      training_files=["<your-preference-fine-tuning-file-id>"],
      model="meta-llama/Llama-3.2-1B",
      n_epochs=1,
      n_checkpoints=1,
      batch_size=8,
      learning_rate=1e-5,
      experiment_name="dpo-fine-tune-job",
      fine_tune_type=FineTuneType.PREFERENCE,
      beta=0.5,
  )

  infrastructure_config = InfrastructureConfig(
      accel_type="MI300X",
      n_accel=8,
  )

  fine_tune = client.fine_tuning.create(
      training_config=training_config,
      infrastructure_config=infrastructure_config,
      project_id=123,
  )

  print(fine_tune.id)
  ```
</CodeGroup>

Preference tuning works with all base models in SeekrFlow. The remaining steps for monitoring, deployment, and inference are the same as other fine-tuning methods. See [Create a fine-tuning job](/flow/sdk/fine-tuning/create-fine-tuning-job) for the full workflow.
