Source tracing lets you prove that any model or agent answer came directly from the exact files you uploaded. Every chunk retrieved from a vector database carries a traceable lineage: model output → retrieved chunk → Markdown location → original uploaded file. This is designed to support auditability, compliance, and operational decision-making in environments where document-level proof of origin is required.

How source tracing works

When a file is ingested into a vector database, the pipeline captures provenance metadata for every chunk — line range, character offsets, heading hierarchy, and source page number — and stores it alongside the embedding. No additional configuration is required.

Retrieve source tracing fields from a run

After a run completes, list the thread's messages to access the assistant response. When the agent invoked the file search tool, each retrieved chunk's source tracing fields are appended to the message content.

Endpoint: GET /v1/threads/{thread_id}/messages List messages

Each chunk in the response includes:

Field	Description
`chunk_id`	Unique identifier for this chunk. Use it to call the chunk endpoint for full Markdown provenance.
`page`	Page number in the source document (`null` for native Markdown and JSON).
`lines`	Line range in the ingested Markdown.
`section`	Heading hierarchy path from the document root to this chunk.

Retrieve chunk provenance

Use chunk_id with the chunk endpoint to retrieve the full provenance record, including the file_id of the original uploaded file.

Endpoint: GET /v1/flow/vectordb/{database_id}/chunk/{chunk_id} Get chunk

The response includes:

Field	Description
`chunk_id`	The chunk's unique identifier.
`file_id`	ID of the original uploaded file. Use this with the file download endpoint to retrieve the source document.
`chunk_text`	The full text of the chunk as indexed.
`locations`	List of Markdown location objects (see below).

Location object:

Field	Description
`line_number_start`	Starting line in the ingested Markdown.
`line_number_end`	Ending line in the ingested Markdown.
`char_start`	Character offset within the section at the starting line.
`char_end`	Character offset within the section at the ending line.
`hierarchy`	Ordered list of headings from the document root to this chunk.
`page_number`	Source document page number, 1-indexed. `null` for native Markdown and JSON uploads.

Supported file types

File type	`page_number`	`hierarchy`	`lines`	`char_start` / `char_end`
PDF	✓	✓	✓	✓
DOCX	✓	✓	✓	✓
PPTX	✓	✓	✓	✓
Markdown	—	✓	✓	✓
JSON	—	✓	✓	✓

Complete workflow

The full source tracing workflow follows this sequence:

POST /v1/threads/{thread_id}/runs
  └── GET /v1/threads/{thread_id}/messages        (chunk_id, page, section)
        └── GET /v1/flow/vectordb/{database_id}/chunk/{chunk_id}    (file_id, Markdown location)
              └── GET /v1/flow/files/{file_id}/content              (original file)

Run the agent against a thread (Endpoint: POST /v1/threads/{thread_id}/runs Run agent). Once the run completes, list the messages to find the assistant response — each retrieved chunk includes chunk_id, page, and section.
Call the chunk endpoint with chunk_id to get file_id and confirm the exact Markdown lines the answer was drawn from.
Use file_id with the file download endpoint (Endpoint: GET /v1/flow/files/{file_id}/content Download file) to retrieve the original uploaded file and complete the chain of custody.