Training data attribution

Training data attribution surfaces the training data that influenced fine-tuned model outputs. By tracing model responses back to specific question-answer pairs from the training dataset, it helps debug model behavior and audit responses.

Important

This method requires a fine-tuned model created with Seekr's fine-tuning feature. Only models built after September 22nd, 2025 are supported.

Retrieve influential fine-tuning data

import os
from seekrai import SeekrFlow

client = SeekrFlow(api_key=os.environ["SEEKR_API_KEY"])

model_id = "deployment-<your-deployment-id>"

influential_data = client.explainability.get_influential_finetuning_data(
    model_id=model_id,
    question="What is SeekrFlow?"
)
print(influential_data)

If you already have a model response from a prior chat.completions call, pass it as answer to skip an extra generation:

chat_response = client.chat.completions.create(
    model=model_id,
    messages=[{"role": "user", "content": "What is SeekrFlow?"}]
)

influential_data = client.explainability.get_influential_finetuning_data(
    model_id=model_id,
    question="What is SeekrFlow?",
    answer=chat_response.choices[0].message.content,
)

Response structure

{
  "results": [
    {
      "id": "4ac3fcbe-e03a-42f0-b70c-35b6cb8feb4f",
      "influence_level": "high",
      "file_id": "file-eca0f55f-641f-480a-895d-8a1992a47fab",
      "messages": "Q: Example finetuning question\nA: Example finetuning answer"
    }
  ],
  "answer": "Answer to original user query",
  "version": "v0"
}

Fields

  • results (list): The influential fine-tuning Q/A pairs.
    • id (UUID): Unique identifier for the Q/A pair.
    • file_id (UUID): Source file ID. Use this to trace back and edit the source documents from training data.
    • messages (string): The Q/A content in the format Q: <question>\nA: <answer>.
    • influence_level (string): One of high, medium, or low. Irrelevant pairs are filtered out and not returned.
  • answer (string): The model's answer. Echoed back if you provided one; otherwise the answer generated internally.
  • version (string): Schema version (currently "v0").

Common errors

  • TypeError – A required parameter (e.g. question) is missing or invalid.
  • 404 Not found – The provided model_id does not exist.
  • 500 Internal server error – Unexpected server issue; retry the request.

Best practices

  • Interpreting influence levels: high means the Q/A pair strongly shaped the response; medium means moderate impact; low means minimal contribution. Look for recurring high pairs to understand the training patterns driving a response.
  • Unexpected results: If unrelated pairs are surfacing with high influence, review and refine your fine-tuning dataset.
  • Empty results: The model may simply not have found training pairs relevant to the prompt. This is not an error.