> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Training data attribution

Training data attribution surfaces the training data that influenced fine-tuned model outputs. By tracing model responses back to specific question-answer pairs from the training dataset, it helps debug model behavior and audit responses.

<Note>
  This method requires a fine-tuned model created with Seekr's [fine-tuning feature](/flow/sdk/fine-tuning). Only models built after September 22nd, 2025 are supported.
</Note>

## Retrieve influential fine-tuning data

<CodeGroup>
  ```python Python theme={null}
  import os
  from seekrai import SeekrFlow

  client = SeekrFlow(api_key=os.environ["SEEKR_API_KEY"])

  model_id = "deployment-<your-deployment-id>"

  influential_data = client.explainability.get_influential_finetuning_data(
      model_id=model_id,
      question="What is SeekrFlow?"
  )
  print(influential_data)
  ```
</CodeGroup>

If you already have a model response from a prior `chat.completions` call, pass it as `answer` to skip an extra generation:

<CodeGroup>
  ```python Python theme={null}
  chat_response = client.chat.completions.create(
      model=model_id,
      messages=[{"role": "user", "content": "What is SeekrFlow?"}]
  )

  influential_data = client.explainability.get_influential_finetuning_data(
      model_id=model_id,
      question="What is SeekrFlow?",
      answer=chat_response.choices[0].message.content,
  )
  ```
</CodeGroup>

## Response structure

<CodeGroup>
  ```json JSON theme={null}
  {
    "results": [
      {
        "id": "4ac3fcbe-e03a-42f0-b70c-35b6cb8feb4f",
        "influence_level": "high",
        "file_id": "file-eca0f55f-641f-480a-895d-8a1992a47fab",
        "messages": "Q: Example finetuning question\nA: Example finetuning answer"
      }
    ],
    "answer": "Answer to original user query",
    "version": "v0"
  }
  ```
</CodeGroup>

### Fields

* **`results`** *(list)*: The influential fine-tuning Q/A pairs.
  * **`id`** *(UUID)*: Unique identifier for the Q/A pair.
  * **`file_id`** *(UUID)*: Source file ID. Use this to trace back and edit the source documents from training data.
  * **`messages`** *(string)*: The Q/A content in the format `Q: <question>\nA: <answer>`.
  * **`influence_level`** *(string)*: One of `high`, `medium`, or `low`. Irrelevant pairs are filtered out and not returned
* **`answer`** *(string)*: The model's answer. Echoed back if you provided one; otherwise the answer generated internally.
* **`version`** *(string)*: Schema version (currently `"v0"`).

## Common errors

* **TypeError** – A required parameter (e.g. `question`) is missing or invalid.
* **404 Not found** – The provided `model_id` does not exist.
* **500 Internal server error** – Unexpected server issue; retry the request.

## Best practices

* **Interpreting influence levels:** `high` means the Q/A pair strongly shaped the response; `medium` means moderate impact; `low` means minimal contribution. Look for recurring `high` pairs to understand the training patterns driving a response.
* **Unexpected results:** If unrelated pairs are surfacing with high influence, review and refine your fine-tuning dataset.
* **Empty results:** The model may simply not have found training pairs relevant to the prompt. This is not an error.
