> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Monitor ingestion

> Track ingestion progress, interpret per-file statuses, and resolve errors through the data job detail endpoint.

Ingestion status is surfaced through the data job that triggered it. `GET /v1/flow/data-jobs/{id}` returns a canonical view of your job — including nested ingestion jobs, per-file records, timeline events, and the derived `status` that tells you whether you're ready to start alignment.

## Understand data job status

While ingestion is in progress, the data job moves through these states:

| Status            | Description                                                                                 |
| ----------------- | ------------------------------------------------------------------------------------------- |
| `file_processing` | At least one ingestion job is queued or running.                                            |
| `needs_review`    | Manual action required — failed ingestion records, missing files, or missing system prompt. |
| `ready_to_start`  | All ingestion completed successfully and prerequisites for alignment are met.               |

Once you call `/start`, the status mirrors the alignment job (`running`, `completed`, `failed`, etc.).

## Check job status

List all data jobs:

**Endpoint:** `GET /v1/flow/data-jobs` [List data jobs](/flow/reference/get_data_jobs_v1_flow_data_jobs_get)

<CodeGroup>
  ```python Python theme={null}
  from seekrai import SeekrFlow

  client = SeekrFlow()

  jobs = client.data_jobs.list()
  for job in jobs.data:
      print(job.id, job.status, job.created_at)
  ```
</CodeGroup>

Retrieve a specific data job:

**Endpoint:** `GET /v1/flow/data-jobs/{id}` [Get data job](/flow/reference/get_data_job_v1_flow_data_jobs__data_job_id__get)

<CodeGroup>
  ```python Python theme={null}
  detail = client.data_jobs.retrieve("dj-1234567890")
  print("Job ID:", detail.id)
  print("Status:", detail.status)
  ```
</CodeGroup>

**Sample response:**

<CodeGroup>
  ```json JSON theme={null}
  {
      "id": "dj-1b75f4d5-5c9e-4d33-b164-a2393bc5ab6d",
      "name": "Customer support refresh",
      "description": "Prep PDFs and Markdown for Q4 fine-tuning",
      "job_type": "principle_files",
      "status": "ready_to_start",
      "created_at": "2025-11-14T07:35:21.700005Z",
      "updated_at": "2025-11-14T07:37:52.464993Z",
      "ingestion_jobs": [...],
      "files": [...],
      "timeline": [...]
  }
  ```
</CodeGroup>

## Inspect ingestion jobs and file records

The `ingestion_jobs` array contains one entry per ingestion run. Each entry includes a `records` array with independent status and timestamps for every file processed.

<CodeGroup>
  ```python Python theme={null}
  detail = client.data_jobs.retrieve("dj-1234567890")

  for ingestion_job in detail.ingestion_jobs:
      print(f"Ingestion job: {ingestion_job.id} — {ingestion_job.status}")
      for record in ingestion_job.records:
          print(f"  {record.filename}: {record.status}")
          if record.processing_at:
              print(f"    Started: {record.processing_at}")
          if record.completed_at:
              print(f"    Finished: {record.completed_at}")
          if record.status == "failed":
              print(f"    Error: {record.error_message}")
              print(f"    Fix: {record.suggested_fix}")
  ```
</CodeGroup>

### File record fields

| Field            | Description                                                       |
| ---------------- | ----------------------------------------------------------------- |
| `record_id`      | Unique identifier for the file record                             |
| `filename`       | Source filename                                                   |
| `status`         | Per-file processing state                                         |
| `method`         | Ingestion method used (`speed-optimized` or `accuracy-optimized`) |
| `queue_position` | Position in queue when `status` is `queued`                       |
| `error_message`  | Plain-language description of what went wrong                     |
| `suggested_fix`  | Recommended action to resolve the error                           |
| `created_at`     | When the file record was created                                  |
| `processing_at`  | When the file entered the `running` state                         |
| `completed_at`   | When the file entered the `completed` state                       |
| `failed_at`      | When the file entered the `failed` state                          |

### File list

The `files` array in the data job detail provides a unified view of ingestion outputs and manually uploaded Markdown files:

* Entries with a `record_id` came from ingestion and include per-file processing metadata.
* Markdown uploads have `record_id: null` because they skip ingestion and are immediately alignment-ready.

## Read the timeline

The `timeline` array contains ordered milestone events for the job lifecycle.

<CodeGroup>
  ```json JSON expandable theme={null}
  [
      {
          "timestamp": "2025-11-14T07:35:21.700005Z",
          "event_type": "Created",
          "message": "Data job created.",
          "metadata": {}
      },
      {
          "timestamp": "2025-11-14T07:35:22.483459Z",
          "event_type": "File Processing Started",
          "message": "Started processing files.",
          "metadata": {
              "ingestion_job_id": "ij-a19a1923-1d18-4fc7-8365-96d12ea734ce",
              "status": "running",
              "record_count": 5
          }
      },
      {
          "timestamp": "2025-11-14T07:37:52.435803Z",
          "event_type": "File Processing Completed",
          "message": "Finished processing files.",
          "metadata": {
              "ingestion_job_id": "ij-a19a1923-1d18-4fc7-8365-96d12ea734ce",
              "status": "completed",
              "record_count": 5
          }
      }
  ]
  ```
</CodeGroup>

Events are pre-sorted by timestamp.

## Resolve ingestion failures

When a file fails, its record includes `error_message` and `suggested_fix`. The data job remains in `needs_review` until every failed record is resolved — either fixed and retried, or removed.

<CodeGroup>
  ```python Python theme={null}
  detail = client.data_jobs.retrieve("dj-1234567890")

  for ingestion_job in detail.ingestion_jobs:
      for record in ingestion_job.records:
          if record.status == "failed":
              print(f"File: {record.filename}")
              print(f"Error: {record.error_message}")
              print(f"Fix: {record.suggested_fix}")
  ```
</CodeGroup>

To retry, re-upload the corrected file and attach it to the job again via `POST /v1/flow/data-jobs/{id}/add-files`. To skip the file, remove it via `POST /v1/flow/data-jobs/{id}/remove-files`. At least one viable file must remain before alignment can start.

## Troubleshoot common errors

| Error                                                                      | Suggested fix                                       |
| -------------------------------------------------------------------------- | --------------------------------------------------- |
| The file appears to be empty.                                              | Upload a file with content.                         |
| The PDF may be corrupted, password-protected, or in an unsupported format. | Upload a valid, unprotected PDF.                    |
| The PDF contains pages that exceed the maximum supported size.             | Re-export the PDF with smaller page dimensions.     |
| The file was not found or is not owned by the current user.                | Re-upload the file or verify the correct `file_id`. |
| Service temporarily unavailable.                                           | Retry the job after a brief wait.                   |
| Internal processing failure.                                               | If the issue persists, contact support.             |
