Explainability
Retrieving Influential Fine-Tuning Data
The Explainability API helps you understand why a fine-tuned model generated a particular response. It surfaces the most influential question/answer pairs from the fine-tuning dataset that contributed to the model’s output, along with influence levels (high
, medium
, low
).
This feature allows you to debug, audit, and build trust in model outputs by tying responses back to their training examples.
Prerequisites
- You must use a fine-tuned model created with Seekr’s finetuning feature.
- Only fine-tuned models created after September 22nd, 2025 are supported.
- You’ll need:
- Your
model_id
(deployment ID). - Your Seekr API key.
- Your
Usage
You can call the method in two modes:
1. With a Pre-Generated Answer
If you already have a model output (e.g. from a chat.completions
request), you can pass it to avoid recomputing:
import os
from seekrai import SeekrFlow
client = SeekrFlow(api_key=os.environ["SEEKR_API_KEY"])
model_id = "deployment-<your-deployment-id>"
prompt = "What is SeekrFlow?"
# First, generate a model response
chat_response = client.chat.completions.create(
model=model_id,
messages=[{"role": "user", "content": prompt}]
)
answer = chat_response.choices[0].message.content
# Then, retrieve influential finetuning data
influential_data = client.explainability.get_influential_finetuning_data(
model_id=model_id,
question=prompt,
answer=answer
)
print(influential_data)
2. Without a Pre-Generated Answer
If you don’t supply an answer
, the SDK will generate a chat completion internally before evaluating explainability:
influential_data = client.explainability.get_influential_finetuning_data(
model_id=model_id,
question="What is SeekrFlow?"
)
print(influential_data)
Response Structure
A typical response looks like this:
{
"results": [
{
"id": "4ac3fcbe-e03a-42f0-b70c-35b6cb8feb4f",
"influence_level": "high",
"file_id": "file-eca0f55f-641f-480a-895d-8a1992a47fab",
"messages": "Q: Example finetuning question\nA: Example finetuning answer"
}
],
"answer": "Answer to original user query",
"version": "v0"
}
Components
- results (list): The influential fine-tuning Q/A pairs.
- id (UUID): Unique identifier for the Q/A pair.
- file_id (UUID): The source file ID from which the Q/A pair came. Developers can use this to trace back, debug, or even edit the source documents that were part of the training data.
- messages (string): The Q/A content, in the format:
Q: <question> A: <answer>
- influence_level (enum): One of
high
,medium
,low
.irrelevant
pairs are filtered out and not returned.
- answer (string): The model’s answer to your query. If you provided an answer, it is echoed back unchanged; otherwise, it is the one generated internally.
- version (string): Schema version (currently
"v0"
).
Common Errors
- TypeError – A required parameter (e.g.
question
) is missing or invalid. - 404 Not Found – The provided
model_id
does not exist. - 500 Internal Server Error – Unexpected server issue; retry the request.
Example error:
{
"error": "Not Found",
"message": "No deployed model 'model_id' exists for your user.",
"status": 404
}
⚡ Tip: For irrelevant queries (e.g. “Why is the sky blue?”), the API will return an empty results
list.
Best Practices & Tips
- When to provide your own
answer
:- Use this mode when you already have a model output from a prior request.
- It saves cost and time by avoiding duplicate completions.
- When to omit
answer
:- Convenient if you don’t have a prior completion, or want a single call that handles both generation and explainability.
- Interpreting
influence_level
:high
: The Q/A pair strongly shaped the model’s response.medium
: The Q/A pair had some impact.low
: The Q/A pair contributed minimally.- Irrelevant pairs are excluded so the list is focused and actionable.
- Debugging model behavior:
- Look for recurring
high
influence pairs to understand the training patterns behind responses. - If unexpected pairs are surfacing, you may need to adjust your fine-tuning dataset.
- Look for recurring
- Empty results:
- This does not mean the model is broken—it may simply indicate that no training pairs were sufficiently relevant to the prompt.
Updated 9 days ago