Monitor ingestion
Track ingestion progress, interpret per-file statuses, and resolve errors through the data job detail endpoint.
Ingestion status is surfaced through the data job that triggered it. GET /v1/flow/data-jobs/{id} returns a canonical view of your job — including nested ingestion jobs, per-file records, timeline events, and the derived status that tells you whether you're ready to start alignment.
Understand data job status
While ingestion is in progress, the data job moves through these states:
| Status | Description |
|---|---|
file_processing | At least one ingestion job is queued or running. |
needs_review | Manual action required — failed ingestion records, missing files, or missing system prompt. |
ready_to_start | All ingestion completed successfully and prerequisites for alignment are met. |
Once you call /start, the status mirrors the alignment job (running, completed, failed, etc.).
Check job status
List all data jobs:
Endpoint: GET /v1/flow/data-jobs List data jobs
from seekrai import SeekrFlow
client = SeekrFlow()
jobs = client.data_jobs.list()
for job in jobs.data:
print(job.id, job.status, job.created_at)Retrieve a specific data job:
Endpoint: GET /v1/flow/data-jobs/{id} Get data job
detail = client.data_jobs.retrieve("dj-1234567890")
print("Job ID:", detail.id)
print("Status:", detail.status)Sample response:
{
"id": "dj-1b75f4d5-5c9e-4d33-b164-a2393bc5ab6d",
"name": "Customer support refresh",
"description": "Prep PDFs and Markdown for Q4 fine-tuning",
"job_type": "principle_files",
"status": "ready_to_start",
"created_at": "2025-11-14T07:35:21.700005Z",
"updated_at": "2025-11-14T07:37:52.464993Z",
"ingestion_jobs": [...],
"files": [...],
"timeline": [...]
}Inspect ingestion jobs and file records
The ingestion_jobs array contains one entry per ingestion run. Each entry includes a records array with independent status and timestamps for every file processed.
detail = client.data_jobs.retrieve("dj-1234567890")
for ingestion_job in detail.ingestion_jobs:
print(f"Ingestion job: {ingestion_job.id} — {ingestion_job.status}")
for record in ingestion_job.records:
print(f" {record.filename}: {record.status}")
if record.processing_at:
print(f" Started: {record.processing_at}")
if record.completed_at:
print(f" Finished: {record.completed_at}")
if record.status == "failed":
print(f" Error: {record.error_message}")
print(f" Fix: {record.suggested_fix}")File record fields
| Field | Description |
|---|---|
record_id | Unique identifier for the file record |
filename | Source filename |
status | Per-file processing state |
method | Ingestion method used (speed-optimized or accuracy-optimized) |
queue_position | Position in queue when status is queued |
error_message | Plain-language description of what went wrong |
suggested_fix | Recommended action to resolve the error |
created_at | When the file record was created |
processing_at | When the file entered the running state |
completed_at | When the file entered the completed state |
failed_at | When the file entered the failed state |
File list
The files array in the data job detail provides a unified view of ingestion outputs and manually uploaded Markdown files:
- Entries with a
record_idcame from ingestion and include per-file processing metadata. - Markdown uploads have
record_id: nullbecause they skip ingestion and are immediately alignment-ready.
Read the timeline
The timeline array contains ordered milestone events for the job lifecycle.
[
{
"timestamp": "2025-11-14T07:35:21.700005Z",
"event_type": "Created",
"message": "Data job created.",
"metadata": {}
},
{
"timestamp": "2025-11-14T07:35:22.483459Z",
"event_type": "File Processing Started",
"message": "Started processing files.",
"metadata": {
"ingestion_job_id": "ij-a19a1923-1d18-4fc7-8365-96d12ea734ce",
"status": "running",
"record_count": 5
}
},
{
"timestamp": "2025-11-14T07:37:52.435803Z",
"event_type": "File Processing Completed",
"message": "Finished processing files.",
"metadata": {
"ingestion_job_id": "ij-a19a1923-1d18-4fc7-8365-96d12ea734ce",
"status": "completed",
"record_count": 5
}
}
]Events are pre-sorted by timestamp.
Resolve ingestion failures
When a file fails, its record includes error_message and suggested_fix. The data job remains in needs_review until every failed record is resolved — either fixed and retried, or removed.
detail = client.data_jobs.retrieve("dj-1234567890")
for ingestion_job in detail.ingestion_jobs:
for record in ingestion_job.records:
if record.status == "failed":
print(f"File: {record.filename}")
print(f"Error: {record.error_message}")
print(f"Fix: {record.suggested_fix}")To retry, re-upload the corrected file and attach it to the job again via POST /v1/flow/data-jobs/{id}/add-files. To skip the file, remove it via POST /v1/flow/data-jobs/{id}/remove-files. At least one viable file must remain before alignment can start.
Troubleshoot common errors
| Error | Suggested fix |
|---|---|
| The file appears to be empty. | Upload a file with content. |
| The PDF may be corrupted, password-protected, or in an unsupported format. | Upload a valid, unprotected PDF. |
| The PDF contains pages that exceed the maximum supported size. | Re-export the PDF with smaller page dimensions. |
| The file was not found or is not owned by the current user. | Re-upload the file or verify the correct file_id. |
| Service temporarily unavailable. | Retry the job after a brief wait. |
| Internal processing failure. | If the issue persists, contact support. |
Updated 2 months ago
