Monitor ingestion

Track ingestion progress, interpret per-file statuses, and resolve errors through the data job detail endpoint.

Ingestion status is surfaced through the data job that triggered it. GET /v1/flow/data-jobs/{id} returns a canonical view of your job — including nested ingestion jobs, per-file records, timeline events, and the derived status that tells you whether you're ready to start alignment.

Understand data job status

While ingestion is in progress, the data job moves through these states:

StatusDescription
file_processingAt least one ingestion job is queued or running.
needs_reviewManual action required — failed ingestion records, missing files, or missing system prompt.
ready_to_startAll ingestion completed successfully and prerequisites for alignment are met.

Once you call /start, the status mirrors the alignment job (running, completed, failed, etc.).

Check job status

List all data jobs:

Endpoint: GET /v1/flow/data-jobs List data jobs

from seekrai import SeekrFlow

client = SeekrFlow()

jobs = client.data_jobs.list()
for job in jobs.data:
    print(job.id, job.status, job.created_at)

Retrieve a specific data job:

Endpoint: GET /v1/flow/data-jobs/{id} Get data job

detail = client.data_jobs.retrieve("dj-1234567890")
print("Job ID:", detail.id)
print("Status:", detail.status)

Sample response:

{
    "id": "dj-1b75f4d5-5c9e-4d33-b164-a2393bc5ab6d",
    "name": "Customer support refresh",
    "description": "Prep PDFs and Markdown for Q4 fine-tuning",
    "job_type": "principle_files",
    "status": "ready_to_start",
    "created_at": "2025-11-14T07:35:21.700005Z",
    "updated_at": "2025-11-14T07:37:52.464993Z",
    "ingestion_jobs": [...],
    "files": [...],
    "timeline": [...]
}

Inspect ingestion jobs and file records

The ingestion_jobs array contains one entry per ingestion run. Each entry includes a records array with independent status and timestamps for every file processed.

detail = client.data_jobs.retrieve("dj-1234567890")

for ingestion_job in detail.ingestion_jobs:
    print(f"Ingestion job: {ingestion_job.id} — {ingestion_job.status}")
    for record in ingestion_job.records:
        print(f"  {record.filename}: {record.status}")
        if record.processing_at:
            print(f"    Started: {record.processing_at}")
        if record.completed_at:
            print(f"    Finished: {record.completed_at}")
        if record.status == "failed":
            print(f"    Error: {record.error_message}")
            print(f"    Fix: {record.suggested_fix}")

File record fields

FieldDescription
record_idUnique identifier for the file record
filenameSource filename
statusPer-file processing state
methodIngestion method used (speed-optimized or accuracy-optimized)
queue_positionPosition in queue when status is queued
error_messagePlain-language description of what went wrong
suggested_fixRecommended action to resolve the error
created_atWhen the file record was created
processing_atWhen the file entered the running state
completed_atWhen the file entered the completed state
failed_atWhen the file entered the failed state

File list

The files array in the data job detail provides a unified view of ingestion outputs and manually uploaded Markdown files:

  • Entries with a record_id came from ingestion and include per-file processing metadata.
  • Markdown uploads have record_id: null because they skip ingestion and are immediately alignment-ready.

Read the timeline

The timeline array contains ordered milestone events for the job lifecycle.

[
    {
        "timestamp": "2025-11-14T07:35:21.700005Z",
        "event_type": "Created",
        "message": "Data job created.",
        "metadata": {}
    },
    {
        "timestamp": "2025-11-14T07:35:22.483459Z",
        "event_type": "File Processing Started",
        "message": "Started processing files.",
        "metadata": {
            "ingestion_job_id": "ij-a19a1923-1d18-4fc7-8365-96d12ea734ce",
            "status": "running",
            "record_count": 5
        }
    },
    {
        "timestamp": "2025-11-14T07:37:52.435803Z",
        "event_type": "File Processing Completed",
        "message": "Finished processing files.",
        "metadata": {
            "ingestion_job_id": "ij-a19a1923-1d18-4fc7-8365-96d12ea734ce",
            "status": "completed",
            "record_count": 5
        }
    }
]

Events are pre-sorted by timestamp.

Resolve ingestion failures

When a file fails, its record includes error_message and suggested_fix. The data job remains in needs_review until every failed record is resolved — either fixed and retried, or removed.

detail = client.data_jobs.retrieve("dj-1234567890")

for ingestion_job in detail.ingestion_jobs:
    for record in ingestion_job.records:
        if record.status == "failed":
            print(f"File: {record.filename}")
            print(f"Error: {record.error_message}")
            print(f"Fix: {record.suggested_fix}")

To retry, re-upload the corrected file and attach it to the job again via POST /v1/flow/data-jobs/{id}/add-files. To skip the file, remove it via POST /v1/flow/data-jobs/{id}/remove-files. At least one viable file must remain before alignment can start.

Troubleshoot common errors

ErrorSuggested fix
The file appears to be empty.Upload a file with content.
The PDF may be corrupted, password-protected, or in an unsupported format.Upload a valid, unprotected PDF.
The PDF contains pages that exceed the maximum supported size.Re-export the PDF with smaller page dimensions.
The file was not found or is not owned by the current user.Re-upload the file or verify the correct file_id.
Service temporarily unavailable.Retry the job after a brief wait.
Internal processing failure.If the issue persists, contact support.