Explainability

Retrieving Influential Fine-Tuning Data

The Explainability API helps you understand why a fine-tuned model generated a particular response. It surfaces the most influential question/answer pairs from the fine-tuning dataset that contributed to the model’s output, along with influence levels (high, medium, low).

This feature allows you to debug, audit, and build trust in model outputs by tying responses back to their training examples.


Prerequisites

  • You must use a fine-tuned model created with Seekr’s finetuning feature.
    • Only fine-tuned models created after September 22nd, 2025 are supported.
  • You’ll need:
    • Your model_id (deployment ID).
    • Your Seekr API key.

Usage

You can call the method in two modes:

1. With a Pre-Generated Answer

If you already have a model output (e.g. from a chat.completions request), you can pass it to avoid recomputing:

import os
from seekrai import SeekrFlow

client = SeekrFlow(api_key=os.environ["SEEKR_API_KEY"])

model_id = "deployment-<your-deployment-id>"
prompt = "What is SeekrFlow?"

# First, generate a model response
chat_response = client.chat.completions.create(
    model=model_id,
    messages=[{"role": "user", "content": prompt}]
)
answer = chat_response.choices[0].message.content

# Then, retrieve influential finetuning data
influential_data = client.explainability.get_influential_finetuning_data(
    model_id=model_id,
    question=prompt,
    answer=answer
)
print(influential_data)

2. Without a Pre-Generated Answer

If you don’t supply an answer, the SDK will generate a chat completion internally before evaluating explainability:

influential_data = client.explainability.get_influential_finetuning_data(
    model_id=model_id,
    question="What is SeekrFlow?"
)
print(influential_data)

Response Structure

A typical response looks like this:

{
  "results": [
    {
      "id": "4ac3fcbe-e03a-42f0-b70c-35b6cb8feb4f",
      "influence_level": "high",
      "file_id": "file-eca0f55f-641f-480a-895d-8a1992a47fab",
      "messages": "Q: Example finetuning question\nA: Example finetuning answer"
    }
  ],
  "answer": "Answer to original user query",
  "version": "v0"
}

Components

  • results (list): The influential fine-tuning Q/A pairs.
    • id (UUID): Unique identifier for the Q/A pair.
    • file_id (UUID): The source file ID from which the Q/A pair came. Developers can use this to trace back, debug, or even edit the source documents that were part of the training data.
    • messages (string): The Q/A content, in the format:
      Q: <question>
      A: <answer>
    • influence_level (enum): One of high, medium, low. irrelevant pairs are filtered out and not returned.
  • answer (string): The model’s answer to your query. If you provided an answer, it is echoed back unchanged; otherwise, it is the one generated internally.
  • version (string): Schema version (currently "v0").

Common Errors

  • TypeError – A required parameter (e.g. question) is missing or invalid.
  • 404 Not Found – The provided model_id does not exist.
  • 500 Internal Server Error – Unexpected server issue; retry the request.

Example error:

{
  "error": "Not Found",
  "message": "No deployed model 'model_id' exists for your user.",
  "status": 404
}

Tip: For irrelevant queries (e.g. “Why is the sky blue?”), the API will return an empty results list.


Best Practices & Tips

  • When to provide your own answer:
    • Use this mode when you already have a model output from a prior request.
    • It saves cost and time by avoiding duplicate completions.
  • When to omit answer:
    • Convenient if you don’t have a prior completion, or want a single call that handles both generation and explainability.
  • Interpreting influence_level:
    • high: The Q/A pair strongly shaped the model’s response.
    • medium: The Q/A pair had some impact.
    • low: The Q/A pair contributed minimally.
    • Irrelevant pairs are excluded so the list is focused and actionable.
  • Debugging model behavior:
    • Look for recurring high influence pairs to understand the training patterns behind responses.
    • If unexpected pairs are surfacing, you may need to adjust your fine-tuning dataset.
  • Empty results:
    • This does not mean the model is broken—it may simply indicate that no training pairs were sufficiently relevant to the prompt.