Create context-grounded fine-tuning data

Generate structured context data for RAG-enhanced fine-tuning using context_grounded_files or context_grounded_vector_db data jobs.

Context-grounded data jobs ground model responses in an existing knowledge source. Two job types are available:

  • context_grounded_files — aligns against uploaded and ingested documents. Follows the same file preparation and ingestion steps as principle_files, but no system prompt is required.
  • context_grounded_vector_db — aligns against a pre-built SeekrFlow vector database. No file upload or ingestion is needed.

Neither type requires a system prompt to start. Not sure which approach fits your use case? See Fine-tuning.


context_grounded_files

Step 1: Create the job

Endpoint: POST /v1/flow/data-jobs Submit data job

from seekrai import SeekrFlow

client = SeekrFlow()

data_job = client.data_jobs.create(
    name="Customer support context grounding",
    description="Ground responses in ingested support docs",
    job_type="context_grounded_files",
)
data_job_id = data_job.id
print("Data job ID:", data_job_id)

Step 2: Upload and attach files

Upload your source files using the Files API, then attach them to the job. Ingestion runs automatically for non-Markdown files. Markdown files are alignment-ready immediately.

Endpoint: POST /v1/flow/data-jobs/{id}/add-files Add files to data job

data_job = client.data_jobs.add_files(
    data_job_id,
    file_ids=[
        "file-25e34f96-2130-11f0-9236-3e11346bffff",
        "file-efd0b334-2130-11f0-9236-3e11346bffff",
    ],
    method="accuracy-optimized",
)
print("Status:", data_job.status)

Poll GET /v1/flow/data-jobs/{id} until status is ready_to_start. See Monitor ingestion for details on job states and error handling.

Step 3: Start alignment

No system prompt is required. Call /start once status is ready_to_start.

Endpoint: POST /v1/flow/data-jobs/{id}/start Start data job alignment

detail = client.data_jobs.start(data_job_id)
print("Alignment job:", detail.alignment_job.id, detail.alignment_job.status)

Prerequisites:

  • status must be ready_to_start
  • At least one processed Markdown file attached (ingested or uploaded)

Step 4: Monitor alignment

detail = client.data_jobs.retrieve(data_job_id)
print("Status:", detail.status)
if detail.alignment_job:
    print("Alignment job:", detail.alignment_job.id)

Once status shows completed, retrieve the output files using the endpoint in Step 5.

Step 5: Retrieve output files

Endpoint: GET /v1/flow/alignment/{job_id}/outputs

import requests

alignment_job_id = detail.alignment_job.id
response = requests.get(
    f"https://flow.seekr.com/v1/flow/alignment/{alignment_job_id}/outputs",
    headers=headers,
)
outputs = response.json()

The output file with "purpose": "fine-tune" (the .parquet) is the file ID to use when creating a fine-tuning job.


context_grounded_vector_db

Use this job type when your knowledge source is already indexed in a SeekrFlow vector database. No file upload or ingestion is needed — attach the vector database ID and start.

Step 1: Create the job and attach the vector database

Endpoints: POST /v1/flow/data-jobs Submit data job · PATCH /v1/flow/data-jobs/{id} Update data job

Create the job and set vector_database_id before starting. No file upload or ingestion is needed.

data_job = client.data_jobs.create(
    name="Customer support refresh",
    description="Align against existing vector store",
    job_type="context_grounded_vector_db",
)
data_job_id = data_job.id

data_job = client.data_jobs.update(
    data_job_id,
    vector_database_id="9aab2ceb-7c7b-4c07-8ff4-5416dfad221a",
)
print("Vector DB set:", data_job.vector_database_id)

Step 2: Start alignment

Endpoint: POST /v1/flow/data-jobs/{id}/start Start data job alignment

detail = client.data_jobs.start(data_job_id)
print("Alignment job:", detail.alignment_job.id, detail.alignment_job.status)

Prerequisites:

  • status must be ready_to_start
  • vector_database_id must be set
  • No system prompt required

Step 3: Monitor alignment

detail = client.data_jobs.retrieve(data_job_id)
print("Status:", detail.status)
if detail.alignment_job:
    print("Alignment job:", detail.alignment_job.id)

Once status shows completed, retrieve the output files using the endpoint in Step 4.

Step 4: Retrieve output files

Endpoint: GET /v1/flow/alignment/{job_id}/outputs

import requests

alignment_job_id = detail.alignment_job.id
response = requests.get(
    f"https://flow.seekr.com/v1/flow/alignment/{alignment_job_id}/outputs",
    headers=headers,
)
outputs = response.json()

The output file with "purpose": "fine-tune" (the .parquet) is the file ID to use when creating a fine-tuning job.