Fine-tuning UI guide
Create and manage fine-tuning jobs through the SeekrFlow web interface.
Fine-tuning SDK guide
Create and manage fine-tuning jobs programmatically with the Python SDK.
How fine-tuning works
The fine-tuning process adjusts model parameters by training on structured question-and-answer pairs. Models learn from examples that demonstrate desired behaviors, domain knowledge, and specific output patterns. The training produces specialized models with deeper expertise while retaining general capabilities from the base model.When to use fine-tuning
Fine-tuning provides value when:- Working with proprietary or sensitive information not in base model training data.
- Requiring specific output formats, styles, or tones.
- Optimizing for tasks more easily demonstrated than described.
- Improving accuracy on domain-specific terminology or concepts.
- Reducing costs by using smaller specialized models.
Fine-tuning methods
SeekrFlow supports multiple fine-tuning approaches:| Method | Description | Best for |
|---|---|---|
| Instruction fine-tuning | Standard approach that trains models on question-and-answer pairs aligned to task-specific instructions. Embeds domain knowledge directly into model parameters. | Embedding proprietary knowledge, customizing behavior and tone, optimizing for demonstrated tasks |
| Context-grounded fine-tuning | Training approach that teaches models to access and retrieve information from external knowledge bases during inference. Maintains accuracy with frequently changing information. | Dynamic information that requires real-time updates, maintaining current data without retraining |
| Reinforcement tuning (GRPO) | Teaches the model to judge its own outputs using a reward function that scores generated responses against reference answers, rather than directly imitating target responses. | Improving response quality, aligning outputs with brand voice, reducing unwanted behaviors |
| Preference tuning (DPO) | Learns directly from comparisons between preferred and dispreferred responses, without requiring reference answers or reward functions. | Aligning outputs with subjective quality criteria, organizational standards, human feedback |
Method comparison
| Best for | Knowledge location | Update mechanism | Retrieval required | Training data | Training cost | Iteration speed | Artifact size | |
|---|---|---|---|---|---|---|---|---|
| Instruction | Static domain knowledge | Model parameters | Retrain model | No | QA pairs | High (full) | Slow | Large (full model) |
| Context-grounded | Dynamic information | External knowledge base | Update knowledge base | Yes | QA pairs + knowledge base | High (full) | Slow | Large (full model) |
| Reinforcement tuning | Quality alignment (verifiable) | Model parameters | Retrain model | No | Prompts + reference answers | High (full) | Slow | Large (full model) |
| Preference tuning | Quality alignment (subjective) | Model parameters | Retrain model | No | Preferred/rejected response pairs | High (full) | Slow | Large (full model) |
Recommended approach
Most fine-tuning projects should:
This progression balances training efficiency with model quality while minimizing upfront investment.