> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Vision language tuning

> Fine-tune vision-language models on image-text datasets using the SeekrFlow Python SDK.

For conceptual background on vision language tuning, including supported models and when to use it, see [Vision language tuning](/flow/components/fine-tuning/vision-language-tuning).

## Prepare a vision language dataset

Upload a dataset that follows the vision-language message schema. Each training example is a single-turn conversation where user messages contain both image and text content.

### Example dataset schema

<CodeGroup>
  ```json JSON theme={null}
  {
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
          {"type": "text", "text": "What product is this?"}
        ]
      },
      {
        "role": "assistant",
        "content": [
          {"type": "text", "text": "This is the ACME Widget Pro X7, a second-generation industrial sensor unit. It features the distinctive blue housing and triple-port connector array."}
        ]
      }
    ]
  }
  ```
</CodeGroup>

### Dataset validation

SeekrFlow validates the dataset on upload and rejects datasets with:

* Malformed message content or missing required fields
* Unsupported image formats
* Schema violations against the expected multimodal structure

## Upload a vision language dataset

Upload your dataset with `purpose=fine-tune`. See [Upload file](/flow/reference/file_upload_v1_flow_files_put) for the full schema reference.

<CodeGroup>
  ```python Python theme={null}
  from seekrai import SeekrFlow

  client = SeekrFlow()

  response = client.files.upload("vlm-dataset.jsonl", purpose="fine-tune")
  print(f"Uploaded file ID: {response.id}")
  ```
</CodeGroup>

## Create a vision language fine-tuning job

<CodeGroup>
  ```python Python expandable theme={null}
  from seekrai.types import TrainingConfig, InfrastructureConfig
  from seekrai import SeekrFlow

  client = SeekrFlow()

  training_config = TrainingConfig(
      training_files=[
          "file-830e9be3-25tt-13y1-0298-3a035e73o90"  # Vision-language dataset file ID
      ],
      model="meta-llama/Llama-3.2-11B-Vision-Instruct",
      n_epochs=1,
      n_checkpoints=1,
      batch_size=4,
      learning_rate=1e-5,
      experiment_name="vlm_helperbot_v1",
  )

  infrastructure_config = InfrastructureConfig(
      n_accel=8,
      accel_type="MI300X",
  )

  fine_tune = client.fine_tuning.create(
      training_config=training_config,
      infrastructure_config=infrastructure_config,
      project_id=123,
  )

  print(fine_tune.id)
  ```
</CodeGroup>

All other `TrainingConfig` parameters behave the same as in text-only instruction fine-tuning. See [Create a fine-tuning job](/flow/sdk/fine-tuning/create-fine-tuning-job) for the full workflow including project setup, file retrieval, and monitoring.
