When to use file search
Anytime you have a use case where you want to leverage your business documentation to complete a task — summarize documents, answer questions based on company knowledge, or generate reports using internal data — configure a file search tool.Prerequisites
File search requires a populated vector database. To create and populate one, see Create and populate a vector database.Create a file search tool
Parameters
| Parameter | Required | Description |
|---|---|---|
name | Yes | A unique name for the tool. |
description | Yes | Description that helps the agent understand when to use this tool. |
file_search_index | Yes | ID of your vector database containing the documents. |
top_k | No | Maximum number of chunks to return from the search. |
score_threshold | No | Minimum similarity score for a chunk to be included in results. |
Link to an agent
Best practices
Tool description
- Write clear, concise descriptions that specify when the tool should be invoked — for example, “Use this tool to search internal company policies when a user asks about HR procedures.”
- Include example queries or scenarios to help the agent understand the tool’s intended use.
- Clearly define the scope and limitations of the tool — for example, “This tool only searches technical documentation, not customer support tickets.”
top_k
- Higher values increase the likelihood of including relevant results but may introduce more noise. Use for exploratory queries or when context is broad.
- Lower values give more focused results but risk missing relevant information. Use for targeted tasks.
- Common settings are between 3 and 10.
score_threshold
- Use
score_thresholdto filter out weak or irrelevant matches, improving overall result quality. - Start with a moderate threshold (such as 0.5–0.7) and adjust based on observed retrieval quality.
Source tracing
When file search retrieves chunks during a run, the assistant message includes source tracing fields for each chunk alongside the retrieved text.| Field | Description |
|---|---|
chunk_id | Unique identifier for the retrieved chunk. Use it with the chunk endpoint to get the full provenance record, including file_id and exact Markdown location. |
text | The full text of the chunk as indexed. |
page | Page number in the source document (null for native Markdown and JSON). |
lines | Line range in the ingested Markdown. |
section | Heading hierarchy path from the document root to this chunk. |