Explainability
Understand model outputs through influential training data and transparent reasoning.
Explainability surfaces the training data that influenced fine-tuned model outputs. By tracing model responses back to specific question-and-answer pairs from the training dataset, explainability helps debug model behavior, audit responses, and build trust in model outputs.
How explainability works
When a fine-tuned model generates a response, explainability identifies the most influential training examples that shaped that output. Each influential example receives an influence level (high, medium, or low) indicating its contribution to the model's response.
Requirements
Explainability is available for:
- Fine-tuned models created through SeekrFlow
- Models trained after September 22, 2025
- Deployed models with active endpoints
Influence levels
Training examples are ranked by their influence on model outputs:
| Level | Description |
|---|---|
| High | Training example strongly shaped the model response |
| Medium | Training example had moderate impact on output |
| Low | Training example contributed minimally to response |
Irrelevant training examples are filtered out and not returned.
Use cases
Explainability supports several important workflows:
Debugging model behavior – Identify which training examples drive unexpected or incorrect responses
Auditing outputs – Trace model decisions back to source training data for compliance and verification
Dataset refinement – Discover patterns in influential training examples to improve fine-tuning datasets
Building trust – Provide transparency into model decision-making for stakeholders
Traceability
Explainability responses include file identifiers linking back to source documents. This traceability connects model outputs to original training materials, supporting debugging and dataset updates.
Updated about 1 month ago
