File search

File search, sometimes referred to as agentic RAG, gives your agents access to your business documents within an agentic application. When invoked, the tool retrieves the most relevant document chunks to help your agent complete its task.

When to use file search

Anytime you have a use case where you want to leverage your business documentation to complete a task — summarize documents, answer questions based on company knowledge, or generate reports using internal data — configure a file search tool.

Prerequisites

File search requires a populated vector database. To create and populate one, see Create and populate a vector database.

Create a file search tool

from seekrai import SeekrFlow
from seekrai.types import CreateFileSearch, FileSearchConfig

client = SeekrFlow()

file_search_tool = client.tools.create(
    CreateFileSearch(
        name="doc_search",
        description="Search through company documents, policies, procedures, and knowledge base articles.",
        config=FileSearchConfig(
            file_search_index="<vector-store-id>",
            top_k=5,
            score_threshold=0.7
        )
    )
)
print(f"Tool created: {file_search_tool.id}")

Parameters

Parameter	Required	Description
`name`	Yes	A unique name for the tool.
`description`	Yes	Description that helps the agent understand when to use this tool.
`file_search_index`	Yes	ID of your vector database containing the documents.
`top_k`	No	Maximum number of chunks to return from the search.
`score_threshold`	No	Minimum similarity score for a chunk to be included in results.

Link to an agent

from seekrai.types import CreateAgentRequest

agent = client.agents.create(
    CreateAgentRequest(
        name="DocBot",
        instructions="You are DocBot, an expert assistant that can search through company documents to answer questions. Always cite the specific documents you reference. Respond only with data returned from the file search tool.",
        model_id="meta-llama/Llama-3.3-70B-Instruct",
        tool_ids=[file_search_tool.id]
    )
)
print(f"Agent ID: {agent.id}")
print(f"Agent status: {agent.status}")

Best practices

Tool description

Write clear, concise descriptions that specify when the tool should be invoked — for example, "Use this tool to search internal company policies when a user asks about HR procedures."
Include example queries or scenarios to help the agent understand the tool's intended use.
Clearly define the scope and limitations of the tool — for example, "This tool only searches technical documentation, not customer support tickets."

top_k

Higher values increase the likelihood of including relevant results but may introduce more noise. Use for exploratory queries or when context is broad.
Lower values give more focused results but risk missing relevant information. Use for targeted tasks.
Common settings are between 3 and 10.

score_threshold

Use score_threshold to filter out weak or irrelevant matches, improving overall result quality.
Start with a moderate threshold (such as 0.5–0.7) and adjust based on observed retrieval quality.

Source tracing

When file search retrieves chunks during a run, the assistant message includes source tracing fields for each chunk alongside the retrieved text.

Field	Description
`chunk_id`	Unique identifier for the retrieved chunk. Use it with the chunk endpoint to get the full provenance record, including `file_id` and exact Markdown location.
`chunk_text`	The full text of the chunk as indexed.
`page`	Page number in the source document (`null` for native Markdown and JSON).
`lines`	Line range in the ingested Markdown.
`section`	Heading hierarchy path from the document root to this chunk.

For the full retrieval workflow — from run response to original source file — see Source tracing.