Source tracing
Source tracing lets you prove that any model or agent answer came directly from the exact files you uploaded. Every chunk retrieved from a vector database carries a traceable lineage: model output → retrieved chunk → Markdown location → original uploaded file. This is designed to support auditability, compliance, and operational decision-making in environments where document-level proof of origin is required.
How source tracing works
When a file is ingested into a vector database, the pipeline captures provenance metadata for every chunk — line range, character offsets, heading hierarchy, and source page number — and stores it alongside the embedding. No additional configuration is required.
Retrieve source tracing fields from a run
After a run completes, list the thread's messages to access the assistant response. When the agent invoked the file search tool, each retrieved chunk's source tracing fields are appended to the message content.
Endpoint: GET /v1/threads/{thread_id}/messages List messages
Each chunk in the response includes:
| Field | Description |
|---|---|
chunk_id | Unique identifier for this chunk. Use it to call the chunk endpoint for full Markdown provenance. |
page | Page number in the source document (null for native Markdown and JSON). |
lines | Line range in the ingested Markdown. |
section | Heading hierarchy path from the document root to this chunk. |
Retrieve chunk provenance
Use chunk_id with the chunk endpoint to retrieve the full provenance record, including the file_id of the original uploaded file.
Endpoint: GET /v1/flow/vectordb/{database_id}/chunk/{chunk_id} Get chunk
The response includes:
| Field | Description |
|---|---|
chunk_id | The chunk's unique identifier. |
file_id | ID of the original uploaded file. Use this with the file download endpoint to retrieve the source document. |
chunk_text | The full text of the chunk as indexed. |
locations | List of Markdown location objects (see below). |
Location object:
| Field | Description |
|---|---|
line_number_start | Starting line in the ingested Markdown. |
line_number_end | Ending line in the ingested Markdown. |
char_start | Character offset within the section at the starting line. |
char_end | Character offset within the section at the ending line. |
hierarchy | Ordered list of headings from the document root to this chunk. |
page_number | Source document page number, 1-indexed. null for native Markdown and JSON uploads. |
Supported file types
| File type | page_number | hierarchy | lines | char_start / char_end |
|---|---|---|---|---|
| ✓ | ✓ | ✓ | ✓ | |
| DOCX | ✓ | ✓ | ✓ | ✓ |
| PPTX | ✓ | ✓ | ✓ | ✓ |
| Markdown | — | ✓ | ✓ | ✓ |
| JSON | — | ✓ | ✓ | ✓ |
Complete workflow
The full source tracing workflow follows this sequence:
POST /v1/threads/{thread_id}/runs
└── GET /v1/threads/{thread_id}/messages (chunk_id, page, section)
└── GET /v1/flow/vectordb/{database_id}/chunk/{chunk_id} (file_id, Markdown location)
└── GET /v1/flow/files/{file_id}/content (original file)
- Run the agent against a thread (Endpoint:
POST /v1/threads/{thread_id}/runsRun agent). Once the run completes, list the messages to find the assistant response — each retrieved chunk includeschunk_id,page, andsection. - Call the chunk endpoint with
chunk_idto getfile_idand confirm the exact Markdown lines the answer was drawn from. - Use
file_idwith the file download endpoint (Endpoint:GET /v1/flow/files/{file_id}/contentDownload file) to retrieve the original uploaded file and complete the chain of custody.
