FileSearch Tool

The filesearch tool allows you to retrieve data from a vector database

What is File Search

File Search, sometimes referred to as "Agentic RAG", allows you to embed and leverage your business documents in an agentic application. When appropriately invoked, the tool retrieves the most relevant document chunks to help your agent complete its task.

When should File Search be used?

Anytime you have a use case where you want to leverage your business documentation to complete a task (summarize documents, answer questions based on company knowledge, or generate reports using internal data), we recommend configuring a File Search tool.

How to create a File Search Tool

To create a file search tool, you must first create a vector database, then embed your business documents. The following steps take you through that process

Create and Populate a Vector Database

To create and populate a vector database with your business documentation, view the associated documentation here.

Create an Agent with File Search

from seekrai import SeekrFlow 
from seekrai.types import CreateAgentRequest, FileSearch, FileSearchEnv

client = SeekrFlow()
database_id = "your_database_id"

# Create an agent with FileSearch capability
agent = client.agents.create(
    CreateAgentRequest(
        name="DocBot",
        instructions="You are DocBot, an expert assistant that can search through company documents to answer questions. Always cite the specific documents you reference. Respond only with data returned from the file_search tool.",
        model_id="meta-llama/Llama-3.1-8B-Instruct",  # or your choice of deployed model
        tools=[FileSearch(
            tool_env=FileSearchEnv(
                file_search_index=database_id,
                document_tool_desc="Search through company documents, policies, procedures, and knowledge base articles.",
              	top_k=5,
              	score_threshold=0.7,
            )
        )],
    ))

# Retrieve the agent to get its details
agent_info = client.agents.retrieve(agent_id=agent.id)

# Print the agent ID and status
print(f"Agent ID: {agent.id}")
print(f"Agent Status: {agent_info.status}")

FileSearchEnv Parameters

The file search Tool class, FileSearchEnv, has four parameters. Each is described below:

  • file_search_indexID of your vector database containing the documents
  • document_tool_descDescription helping the agent understand when to use this tool

Tool Description best practices:

  • **Write clear, concise descriptions **that specify when the tool should be invoked (e.g., "Use this tool to search internal company policies when a user asks about HR procedures.")
  • **Include example queries or scenarios **to help the agent understand the tool's intended use
  • Clearly define the scope and limitations of the tool (e.g., "This tool only searches technical documentation, not customer support tickets.")
  • top_kNumber of results to return from the search

Top K best practices:

  • Higher top_k values increase the likelihood of including relevant results but may introduce more noise—use for exploratory queries or when context is broad
  • Lower top_k values give more focused results but risk missing relevant information—use for targeted tasks
  • Common settings are between 3 and 10
  • score_thresholdMinimum hybrid search threshold for a result to be returned

Use score_threshold to filter out weak or irrelevant matches, improving the overall quality of search results
Start with a moderate threshold (e.g., 0.5–0.7 for cosine similarity) and adjust based on observed retrieval quality and user satisfaction