Skip to main content
A metadata snapshot is a compact summary of the user-defined metadata in a vector database. It captures the most common keys, the most frequent values for each key, and each key’s inferred type. SeekrFlow surfaces this snapshot to agents in the file search tool description, so an agent can filter retrieval using the actual vocabulary stored in your index (for example "Mar" rather than "March", or "P1" rather than "high") instead of guessing at formats. A snapshot captures up to the 100 most common keys, with up to the 10 most common values per key, sampled from as many as 10,000 chunks. This cap keeps the snapshot small enough to sit alongside the tool description without crowding the agent’s context.

Prerequisites

A populated vector database whose chunks carry user-defined metadata. To attach metadata during ingestion or edit it afterward, see Manage chunk metadata.

Generate a snapshot

Use generate_metadata_snapshot to build or rebuild the snapshot for a vector database. The call samples the index, ranks the keys and values by frequency, and stores the result.
from seekrai import SeekrFlow

client = SeekrFlow()

result = client.vector_database.generate_metadata_snapshot(
    database_id="<database-id>",
)

print(f"Captured {result.keys_captured} keys at {result.created_at}")
Generating a snapshot replaces the previous one for that vector database rather than adding to it. Returns:
FieldDescription
vector_database_idID of the vector database the snapshot describes.
keys_capturedNumber of metadata keys captured in this snapshot.
created_atWhen the snapshot was generated.

Read a snapshot

Use get_metadata_snapshot to retrieve the current snapshot, including each key’s type and its top values.
snapshot = client.vector_database.get_metadata_snapshot(
    database_id="<database-id>",
)

for key in snapshot.keys:
    print(key.key_name, key.value_type, key.top_values)
Returns:
FieldDescription
vector_database_idID of the vector database.
created_atWhen the current snapshot was generated.
keysList of captured metadata keys. Each entry has key_name, a value_type of string, number, or boolean, and a top_values list.

Keep the snapshot current

SeekrFlow regenerates the snapshot automatically after an ingestion job adds documents to the vector database, so newly introduced keys and values reach the agent without any manual action. Generate a snapshot manually whenever you want to refresh it on demand, such as after editing metadata in place with update_metadata.

Limitations

  • A snapshot includes only the 100 most common keys, with the 10 most common values each. Metadata that appears rarely across the sampled chunks might not be represented.
  • Values are ranked by how often they occur in the sample, so their order reflects frequency, not business priority.
Last modified on June 30, 2026