> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate a metadata snapshot

> Summarize the most common metadata keys and values in a vector database so agents can filter file search by them.

A metadata snapshot is a compact summary of the user-defined metadata in a vector database. It captures the most common keys, the most frequent values for each key, and each key's inferred type. SeekrFlow surfaces this snapshot to agents in the file search tool description, so an agent can filter retrieval using the actual vocabulary stored in your index (for example `"Mar"` rather than `"March"`, or `"P1"` rather than `"high"`) instead of guessing at formats.

A snapshot captures up to the 100 most common keys, with up to the 10 most common values per key, sampled from as many as 10,000 chunks. This cap keeps the snapshot small enough to sit alongside the tool description without crowding the agent's context.

## Prerequisites

A populated vector database whose chunks carry user-defined metadata. To attach metadata during ingestion or edit it afterward, see [Manage chunk metadata](/flow/sdk/data-engine/manage-chunk-metadata).

## Generate a snapshot

Use `generate_metadata_snapshot` to build or rebuild the snapshot for a vector database. The call samples the index, ranks the keys and values by frequency, and stores the result.

<CodeGroup>
  ```python Python theme={null}
  from seekrai import SeekrFlow

  client = SeekrFlow()

  result = client.vector_database.generate_metadata_snapshot(
      database_id="<database-id>",
  )

  print(f"Captured {result.keys_captured} keys at {result.created_at}")
  ```
</CodeGroup>

Generating a snapshot replaces the previous one for that vector database rather than adding to it.

**Returns:**

| Field                | Description                                        |
| -------------------- | -------------------------------------------------- |
| `vector_database_id` | ID of the vector database the snapshot describes.  |
| `keys_captured`      | Number of metadata keys captured in this snapshot. |
| `created_at`         | When the snapshot was generated.                   |

## Read a snapshot

Use `get_metadata_snapshot` to retrieve the current snapshot, including each key's type and its top values.

<CodeGroup>
  ```python Python theme={null}
  snapshot = client.vector_database.get_metadata_snapshot(
      database_id="<database-id>",
  )

  for key in snapshot.keys:
      print(key.key_name, key.value_type, key.top_values)
  ```
</CodeGroup>

**Returns:**

| Field                | Description                                                                                                                             |
| -------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| `vector_database_id` | ID of the vector database.                                                                                                              |
| `created_at`         | When the current snapshot was generated.                                                                                                |
| `keys`               | List of captured metadata keys. Each entry has `key_name`, a `value_type` of `string`, `number`, or `boolean`, and a `top_values` list. |

## Keep the snapshot current

SeekrFlow regenerates the snapshot automatically after an ingestion job adds documents to the vector database, so newly introduced keys and values reach the agent without any manual action. Generate a snapshot manually whenever you want to refresh it on demand, such as after editing metadata in place with `update_metadata`.

## Limitations

* A snapshot includes only the 100 most common keys, with the 10 most common values each. Metadata that appears rarely across the sampled chunks might not be represented.
* Values are ranked by how often they occur in the sample, so their order reflects frequency, not business priority.
