Source tracing

Source tracing lets you prove that any model or agent answer came directly from the exact files you uploaded. Every chunk retrieved from a vector database carries a traceable lineage: model output → retrieved chunk → Markdown location → original uploaded file. This is designed to support auditability, compliance, and operational decision-making in environments where document-level proof of origin is required.

How source tracing works

When a file is ingested into a vector database, the pipeline captures provenance metadata for every chunk — line range, character offsets, heading hierarchy, and source page number — and stores it alongside the embedding. No additional configuration is required.

Retrieve source tracing fields from a run

After a run completes, list the thread's messages to access the assistant response. When the agent invoked the file search tool, each retrieved chunk's source tracing fields are appended to the message content.

Endpoint: GET /v1/threads/{thread_id}/messages List messages

Each chunk in the response includes:

FieldDescription
chunk_idUnique identifier for this chunk. Use it to call the chunk endpoint for full Markdown provenance.
pagePage number in the source document (null for native Markdown and JSON).
linesLine range in the ingested Markdown.
sectionHeading hierarchy path from the document root to this chunk.

Retrieve chunk provenance

Use chunk_id with the chunk endpoint to retrieve the full provenance record, including the file_id of the original uploaded file.

Endpoint: GET /v1/flow/vectordb/{database_id}/chunk/{chunk_id} Get chunk

The response includes:

FieldDescription
chunk_idThe chunk's unique identifier.
file_idID of the original uploaded file. Use this with the file download endpoint to retrieve the source document.
chunk_textThe full text of the chunk as indexed.
locationsList of Markdown location objects (see below).

Location object:

FieldDescription
line_number_startStarting line in the ingested Markdown.
line_number_endEnding line in the ingested Markdown.
char_startCharacter offset within the section at the starting line.
char_endCharacter offset within the section at the ending line.
hierarchyOrdered list of headings from the document root to this chunk.
page_numberSource document page number, 1-indexed. null for native Markdown and JSON uploads.

Supported file types

File typepage_numberhierarchylineschar_start / char_end
PDF
DOCX
PPTX
Markdown
JSON

Complete workflow

The full source tracing workflow follows this sequence:

POST /v1/threads/{thread_id}/runs
  └── GET /v1/threads/{thread_id}/messages        (chunk_id, page, section)
        └── GET /v1/flow/vectordb/{database_id}/chunk/{chunk_id}    (file_id, Markdown location)
              └── GET /v1/flow/files/{file_id}/content              (original file)
  1. Run the agent against a thread (Endpoint: POST /v1/threads/{thread_id}/runs Run agent). Once the run completes, list the messages to find the assistant response — each retrieved chunk includes chunk_id, page, and section.
  2. Call the chunk endpoint with chunk_id to get file_id and confirm the exact Markdown lines the answer was drawn from.
  3. Use file_id with the file download endpoint (Endpoint: GET /v1/flow/files/{file_id}/content Download file) to retrieve the original uploaded file and complete the chain of custody.