Create context-grounded fine-tuning data
Generate structured context data for RAG-enhanced fine-tuning using context_grounded_files or context_grounded_vector_db data jobs.
Context-grounded data jobs ground model responses in an existing knowledge source. Two job types are available:
context_grounded_files— aligns against uploaded and ingested documents. Follows the same file preparation and ingestion steps asprinciple_files, but no system prompt is required.context_grounded_vector_db— aligns against a pre-built SeekrFlow vector database. No file upload or ingestion is needed.
Neither type requires a system prompt to start. Not sure which approach fits your use case? See Fine-tuning.
context_grounded_files
Step 1: Create the job
Endpoint: POST /v1/flow/data-jobs Submit data job
from seekrai import SeekrFlow
client = SeekrFlow()
data_job = client.data_jobs.create(
name="Customer support context grounding",
description="Ground responses in ingested support docs",
job_type="context_grounded_files",
)
data_job_id = data_job.id
print("Data job ID:", data_job_id)Step 2: Upload and attach files
Upload your source files using the Files API, then attach them to the job. Ingestion runs automatically for non-Markdown files. Markdown files are alignment-ready immediately.
Endpoint: POST /v1/flow/data-jobs/{id}/add-files Add files to data job
data_job = client.data_jobs.add_files(
data_job_id,
file_ids=[
"file-25e34f96-2130-11f0-9236-3e11346bffff",
"file-efd0b334-2130-11f0-9236-3e11346bffff",
],
method="accuracy-optimized",
)
print("Status:", data_job.status)Poll GET /v1/flow/data-jobs/{id} until status is ready_to_start. See Monitor ingestion for details on job states and error handling.
Step 3: Start alignment
No system prompt is required. Call /start once status is ready_to_start.
Endpoint: POST /v1/flow/data-jobs/{id}/start Start data job alignment
detail = client.data_jobs.start(data_job_id)
print("Alignment job:", detail.alignment_job.id, detail.alignment_job.status)Prerequisites:
statusmust beready_to_start- At least one processed Markdown file attached (ingested or uploaded)
Step 4: Monitor alignment
detail = client.data_jobs.retrieve(data_job_id)
print("Status:", detail.status)
if detail.alignment_job:
print("Alignment job:", detail.alignment_job.id)Once status shows completed, retrieve the output files using the endpoint in Step 5.
Step 5: Retrieve output files
Endpoint: GET /v1/flow/alignment/{job_id}/outputs
import requests
alignment_job_id = detail.alignment_job.id
response = requests.get(
f"https://flow.seekr.com/v1/flow/alignment/{alignment_job_id}/outputs",
headers=headers,
)
outputs = response.json()The output file with "purpose": "fine-tune" (the .parquet) is the file ID to use when creating a fine-tuning job.
context_grounded_vector_db
Use this job type when your knowledge source is already indexed in a SeekrFlow vector database. No file upload or ingestion is needed — attach the vector database ID and start.
Step 1: Create the job and attach the vector database
Endpoints: POST /v1/flow/data-jobs Submit data job · PATCH /v1/flow/data-jobs/{id} Update data job
Create the job and set vector_database_id before starting. No file upload or ingestion is needed.
data_job = client.data_jobs.create(
name="Customer support refresh",
description="Align against existing vector store",
job_type="context_grounded_vector_db",
)
data_job_id = data_job.id
data_job = client.data_jobs.update(
data_job_id,
vector_database_id="9aab2ceb-7c7b-4c07-8ff4-5416dfad221a",
)
print("Vector DB set:", data_job.vector_database_id)Step 2: Start alignment
Endpoint: POST /v1/flow/data-jobs/{id}/start Start data job alignment
detail = client.data_jobs.start(data_job_id)
print("Alignment job:", detail.alignment_job.id, detail.alignment_job.status)Prerequisites:
statusmust beready_to_startvector_database_idmust be set- No system prompt required
Step 3: Monitor alignment
detail = client.data_jobs.retrieve(data_job_id)
print("Status:", detail.status)
if detail.alignment_job:
print("Alignment job:", detail.alignment_job.id)Once status shows completed, retrieve the output files using the endpoint in Step 4.
Step 4: Retrieve output files
Endpoint: GET /v1/flow/alignment/{job_id}/outputs
import requests
alignment_job_id = detail.alignment_job.id
response = requests.get(
f"https://flow.seekr.com/v1/flow/alignment/{alignment_job_id}/outputs",
headers=headers,
)
outputs = response.json()The output file with "purpose": "fine-tune" (the .parquet) is the file ID to use when creating a fine-tuning job.
Updated about 1 month ago
