> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Source tracing

> Trace model output back to the original source document using provenance metadata captured at ingestion.

Source tracing lets you prove that any model or agent answer came directly from the exact files you uploaded. Every chunk retrieved from a vector database carries a traceable lineage: model output → retrieved chunk → Markdown location → original uploaded file. This is designed to support auditability, compliance, and operational decision-making in environments where document-level proof of origin is required.

## How source tracing works

When a file is ingested into a vector database, the pipeline captures provenance metadata for every chunk — line range, character offsets, heading hierarchy, and source page number — and stores it alongside the embedding. No additional configuration is required.

## Retrieve source tracing fields from a run

After a run completes, list the thread's messages to access the assistant response. When the agent invoked the file search tool, each retrieved chunk's source tracing fields are appended to the message content.

**Endpoint:** `GET /v1/threads/{thread_id}/messages` [List messages](/flow/reference/list_messages_endpoint_v1_threads__thread_id__messages_get)

Each chunk in the response includes:

| Field      | Description                                                                                       |
| ---------- | ------------------------------------------------------------------------------------------------- |
| `chunk_id` | Unique identifier for this chunk. Use it to call the chunk endpoint for full Markdown provenance. |
| `page`     | Page number in the source document (`null` for native Markdown and JSON).                         |
| `lines`    | Line range in the ingested Markdown.                                                              |
| `section`  | Heading hierarchy path from the document root to this chunk.                                      |

## Retrieve chunk provenance

Use `chunk_id` with the chunk endpoint to retrieve the full provenance record, including the `file_id` of the original uploaded file.

**Endpoint:** `GET /v1/flow/vectordb/{database_id}/chunk/{chunk_id}` [Get chunk](/flow/reference/get_vector_database_chunk_v1_flow_vectordb__database_id__chunk__chunk_id__get)

The response includes:

| Field       | Description                                                                                                 |
| ----------- | ----------------------------------------------------------------------------------------------------------- |
| `chunk_id`  | The chunk's unique identifier.                                                                              |
| `file_id`   | ID of the original uploaded file. Use this with the file download endpoint to retrieve the source document. |
| `text`      | The full text of the chunk as indexed.                                                                      |
| `locations` | List of Markdown location objects (see below).                                                              |

**Location object:**

| Field               | Description                                                                          |
| ------------------- | ------------------------------------------------------------------------------------ |
| `line_number_start` | Starting line in the ingested Markdown.                                              |
| `line_number_end`   | Ending line in the ingested Markdown.                                                |
| `char_start`        | Character offset within the section at the starting line.                            |
| `char_end`          | Character offset within the section at the ending line.                              |
| `hierarchy`         | Ordered list of headings from the document root to this chunk.                       |
| `page_number`       | Source document page number, 1-indexed. `null` for native Markdown and JSON uploads. |

Retrieve a chunk with the `seekrai` SDK using `retrieve_chunk`. The returned object exposes the same fields as the REST response, so you can read `file_id` and walk each location directly. Requires `seekrai` 0.20.0 or later.

<CodeGroup>
  ```python Python theme={null}
  from seekrai import SeekrFlow

  client = SeekrFlow()

  chunk = client.vector_database.retrieve_chunk(
      database_id="<database-id>",
      chunk_id="<chunk-id>",  # from the file search result
  )

  print(chunk.file_id)   # original uploaded file
  print(chunk.text)      # chunk text as indexed

  for location in chunk.locations:
      print(location["page_number"], location["hierarchy"])
  ```
</CodeGroup>

## Supported file types

| File type | `page_number` | `hierarchy` | `lines` | `char_start` / `char_end` |
| --------- | ------------- | ----------- | ------- | ------------------------- |
| PDF       | ✓             | ✓           | ✓       | ✓                         |
| DOCX      | ✓             | ✓           | ✓       | ✓                         |
| PPTX      | ✓             | ✓           | ✓       | ✓                         |
| Markdown  | —             | ✓           | ✓       | ✓                         |
| JSON      | —             | ✓           | ✓       | ✓                         |

## Complete workflow

The full source tracing workflow follows this sequence:

```
POST /v1/threads/{thread_id}/runs
  └── GET /v1/threads/{thread_id}/messages        (chunk_id, page, section)
        └── GET /v1/flow/vectordb/{database_id}/chunk/{chunk_id}    (file_id, Markdown location)
              └── GET /v1/flow/files/{file_id}/content              (original file)
```

1. Run the agent against a thread (**Endpoint:** `POST /v1/threads/{thread_id}/runs` [Run agent](/flow/reference/run_agent_v1_threads__thread_id__runs_post)). Once the run completes, list the messages to find the assistant response — each retrieved chunk includes `chunk_id`, `page`, and `section`.
2. Call the chunk endpoint with `chunk_id` to get `file_id` and confirm the exact Markdown lines the answer was drawn from.
3. Use `file_id` with the file download endpoint (**Endpoint:** `GET /v1/flow/files/{file_id}/content` [Download file](/flow/reference/file_download_content_v1_flow_files__file_id__content_get)) to retrieve the original uploaded file and complete the chain of custody.
