> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Create context-grounded fine-tuning data

> Generate structured context data for RAG-enhanced fine-tuning using context_grounded_files or context_grounded_vector_db data jobs.

Context-grounded data jobs ground model responses in an existing knowledge source. Two job types are available:

* **`context_grounded_files`** — aligns against uploaded and ingested documents. Follows the same file preparation and ingestion steps as `principle_files`, but no system prompt is required.
* **`context_grounded_vector_db`** — aligns against a pre-built SeekrFlow vector database. No file upload or ingestion is needed.

Neither type requires a system prompt to start. Not sure which approach fits your use case? See [Fine-tuning](/flow/components/fine-tuning).

## context\_grounded\_files

### Step 1: Create the job

**Endpoint:** `POST /v1/flow/data-jobs` [Submit data job](/flow/reference/submit_data_job_v1_flow_data_jobs_post)

<CodeGroup>
  ```python Python theme={null}
  from seekrai import SeekrFlow

  client = SeekrFlow()

  data_job = client.data_jobs.create(
      name="Customer support context grounding",
      description="Ground responses in ingested support docs",
      job_type="context_grounded_files",
  )
  data_job_id = data_job.id
  print("Data job ID:", data_job_id)
  ```
</CodeGroup>

### Step 2: Upload and attach files

Upload your source files using the [Files API](/flow/sdk/data-engine/file-ingestion), then attach them to the job. Ingestion runs automatically for non-Markdown files. Markdown files are alignment-ready immediately.

**Endpoint:** `POST /v1/flow/data-jobs/{id}/add-files` [Add files to data job](/flow/reference/add_files_to_data_job_v1_flow_data_jobs__data_job_id__add_files_post)

<CodeGroup>
  ```python Python theme={null}
  data_job = client.data_jobs.add_files(
      data_job_id,
      file_ids=[
          "file-25e34f96-2130-11f0-9236-3e11346bffff",
          "file-efd0b334-2130-11f0-9236-3e11346bffff",
      ],
      method="accuracy-optimized",
  )
  print("Status:", data_job.status)
  ```
</CodeGroup>

Poll `GET /v1/flow/data-jobs/{id}` until `status` is `ready_to_start`. See [Monitor ingestion](/flow/sdk/data-engine/monitor-ingestion) for details on job states and error handling.

### Step 3: Start alignment

No system prompt is required. Call `/start` once `status` is `ready_to_start`.

**Endpoint:** `POST /v1/flow/data-jobs/{id}/start` [Start data job alignment](/flow/reference/start_data_job_alignment_v1_flow_data_jobs__data_job_id__start_post)

<CodeGroup>
  ```python Python theme={null}
  detail = client.data_jobs.start(data_job_id)
  print("Alignment job:", detail.alignment_job.id, detail.alignment_job.status)
  ```
</CodeGroup>

**Prerequisites:**

* `status` must be `ready_to_start`
* At least one processed Markdown file attached (ingested or uploaded)

### Step 4: Monitor alignment

<CodeGroup>
  ```python Python theme={null}
  detail = client.data_jobs.retrieve(data_job_id)
  print("Status:", detail.status)
  if detail.alignment_job:
      print("Alignment job:", detail.alignment_job.id)
  ```
</CodeGroup>

Once `status` shows `completed`, retrieve the output files using the endpoint in Step 5.

### Step 5: Retrieve output files

**Endpoint:** `GET /v1/flow/alignment/{job_id}/outputs`

<CodeGroup>
  ```python Python theme={null}
  import os
  import requests

  alignment_job_id = detail.alignment_job.id
  headers = {"Authorization": os.environ["SEEKR_API_KEY"]}
  response = requests.get(
      f"https://flow.seekr.com/v1/flow/alignment/{alignment_job_id}/outputs",
      headers=headers,
  )
  outputs = response.json()
  ```
</CodeGroup>

The output file with `"purpose": "fine-tune"` (the `.parquet`) is the file ID to use when creating a fine-tuning job.

## context\_grounded\_vector\_db

Use this job type when your knowledge source is already indexed in a SeekrFlow vector database. No file upload or ingestion is needed — attach the vector database ID and start.

### Step 1: Create the job and attach the vector database

**Endpoints:** `POST /v1/flow/data-jobs` [Submit data job](/flow/reference/submit_data_job_v1_flow_data_jobs_post) · `PATCH /v1/flow/data-jobs/{id}` [Update data job](/flow/reference/patch_data_job_v1_flow_data_jobs__data_job_id__patch)

Create the job and set `vector_database_id` before starting. No file upload or ingestion is needed.

<CodeGroup>
  ```python Python theme={null}
  data_job = client.data_jobs.create(
      name="Customer support refresh",
      description="Align against existing vector store",
      job_type="context_grounded_vector_db",
  )
  data_job_id = data_job.id

  data_job = client.data_jobs.update(
      data_job_id,
      vector_database_id="9aab2ceb-7c7b-4c07-8ff4-5416dfad221a",
  )
  print("Vector DB set:", data_job.vector_database_id)
  ```
</CodeGroup>

### Step 2: Start alignment

**Endpoint:** `POST /v1/flow/data-jobs/{id}/start` [Start data job alignment](/flow/reference/start_data_job_alignment_v1_flow_data_jobs__data_job_id__start_post)

<CodeGroup>
  ```python Python theme={null}
  detail = client.data_jobs.start(data_job_id)
  print("Alignment job:", detail.alignment_job.id, detail.alignment_job.status)
  ```
</CodeGroup>

**Prerequisites:**

* `status` must be `ready_to_start`
* `vector_database_id` must be set
* No system prompt required

### Step 3: Monitor alignment

<CodeGroup>
  ```python Python theme={null}
  detail = client.data_jobs.retrieve(data_job_id)
  print("Status:", detail.status)
  if detail.alignment_job:
      print("Alignment job:", detail.alignment_job.id)
  ```
</CodeGroup>

Once `status` shows `completed`, retrieve the output files using the endpoint in Step 4.

### Step 4: Retrieve output files

**Endpoint:** `GET /v1/flow/alignment/{job_id}/outputs`

<CodeGroup>
  ```python Python theme={null}
  import os
  import requests

  alignment_job_id = detail.alignment_job.id
  headers = {"Authorization": os.environ["SEEKR_API_KEY"]}
  response = requests.get(
      f"https://flow.seekr.com/v1/flow/alignment/{alignment_job_id}/outputs",
      headers=headers,
  )
  outputs = response.json()
  ```
</CodeGroup>

The output file with `"purpose": "fine-tune"` (the `.parquet`) is the file ID to use when creating a fine-tuning job.
