Training data attribution
Training data attribution surfaces the training data that influenced fine-tuned model outputs. By tracing model responses back to specific question-answer pairs from the training dataset, it helps debug model behavior and audit responses.
ImportantThis method requires a fine-tuned model created with Seekr's fine-tuning feature. Only models built after September 22nd, 2025 are supported.
Retrieve influential fine-tuning data
import os
from seekrai import SeekrFlow
client = SeekrFlow(api_key=os.environ["SEEKR_API_KEY"])
model_id = "deployment-<your-deployment-id>"
influential_data = client.explainability.get_influential_finetuning_data(
model_id=model_id,
question="What is SeekrFlow?"
)
print(influential_data)If you already have a model response from a prior chat.completions call, pass it as answer to skip an extra generation:
chat_response = client.chat.completions.create(
model=model_id,
messages=[{"role": "user", "content": "What is SeekrFlow?"}]
)
influential_data = client.explainability.get_influential_finetuning_data(
model_id=model_id,
question="What is SeekrFlow?",
answer=chat_response.choices[0].message.content,
)Response structure
{
"results": [
{
"id": "4ac3fcbe-e03a-42f0-b70c-35b6cb8feb4f",
"influence_level": "high",
"file_id": "file-eca0f55f-641f-480a-895d-8a1992a47fab",
"messages": "Q: Example finetuning question\nA: Example finetuning answer"
}
],
"answer": "Answer to original user query",
"version": "v0"
}Fields
results(list): The influential fine-tuning Q/A pairs.id(UUID): Unique identifier for the Q/A pair.file_id(UUID): Source file ID. Use this to trace back and edit the source documents from training data.messages(string): The Q/A content in the formatQ: <question>\nA: <answer>.influence_level(string): One ofhigh,medium, orlow. Irrelevant pairs are filtered out and not returned.
answer(string): The model's answer. Echoed back if you provided one; otherwise the answer generated internally.version(string): Schema version (currently"v0").
Common errors
- TypeError – A required parameter (e.g.
question) is missing or invalid. - 404 Not found – The provided
model_iddoes not exist. - 500 Internal server error – Unexpected server issue; retry the request.
Best practices
- Interpreting influence levels:
highmeans the Q/A pair strongly shaped the response;mediummeans moderate impact;lowmeans minimal contribution. Look for recurringhighpairs to understand the training patterns driving a response. - Unexpected results: If unrelated pairs are surfacing with high influence, review and refine your fine-tuning dataset.
- Empty results: The model may simply not have found training pairs relevant to the prompt. This is not an error.
Updated about 1 month ago
