Choose a fine-tuning method
Decision guide for selecting the right fine-tuning approach based on use case, requirements, and constraints.
Selecting the appropriate fine-tuning method depends on what knowledge the model needs to learn, how that knowledge changes over time, and operational constraints around cost and deployment.
Decision framework
What type of knowledge are you teaching?
Static domain knowledge or behaviors Use instruction fine-tuning or context-grounded fine-tuning.
- Choose instruction fine-tuning when knowledge is stable and can be embedded directly into model parameters
- Choose context-grounded fine-tuning when knowledge changes frequently and requires external knowledge bases
Quality optimization with verifiable answers Use reinforcement tuning (GRPO).
- Reinforcement tuning scores generated responses against reference answers using a reward function
- Effective for domains with definite, verifiable correct answers
- Typically applied after instruction fine-tuning to refine response quality
Quality alignment from human preferences Use preference tuning (DPO).
- Preference tuning learns from comparisons between preferred and dispreferred responses
- No reference answers or reward functions required — only paired judgments of which response is better
- Effective for subjective quality criteria like tone, compliance, or customer experience
How frequently does the information change?
Rarely or never changes Use instruction fine-tuning.
- Embeds knowledge directly into model parameters
- No retrieval infrastructure required at inference time
- Ideal for proprietary methodologies, stable domain expertise, or fixed compliance rules
Changes regularly Use context-grounded fine-tuning.
- Teaches models to use external knowledge bases effectively
- Update knowledge bases without retraining models
- Ideal for current events, policy documents, or dynamic datasets
What are your cost and speed constraints?
Need fast iteration or have cost constraints Use LoRA with any method.
- LoRA reduces training time and cost by updating only small adapter modules
- Works as a modifier for instruction, context-grounded, reinforcement, or preference tuning
- Start with LoRA; move to full fine-tuning only if LoRA cannot achieve desired behavior
Can invest in full training Use full fine-tuning.
- Updates all model parameters for maximum flexibility
- Consider when LoRA experiments show the task requires deep parameter changes
- Produces larger artifacts and requires more compute resources
Does your training data include images?
Yes – image-text pairs Use vision language tuning.
- Select a supported vision-language model and prepare datasets with image-text pairs
- Currently supports instruction fine-tuning; the method selection guidance below still applies
- SeekrFlow enforces model-dataset compatibility — vision datasets require VLMs and vice versa
No – text only Use any fine-tuning method with a text-based model.
Do you need multiple task variants?
Yes – multiple related tasks Use LoRA.
- Train multiple small adapters from a single base model
- Swap adapters dynamically based on task
- Cost-effective way to maintain task variants
No – single specialized model Use full fine-tuning.
- Produces standalone model optimized for specific task
- Simpler deployment without adapter management
Common combinations
Fine-tuning methods can be combined in training pipelines:
Instruction fine-tuning + reinforcement tuning First embed domain knowledge through instruction fine-tuning, then refine response quality and alignment through reinforcement tuning. This two-stage approach teaches what to know and how to respond.
Instruction fine-tuning + preference tuning First embed domain knowledge through instruction fine-tuning, then align outputs with human preferences through preference tuning. Use this when quality criteria are subjective and better expressed through comparative judgments than reward functions.
LoRA + any method Apply LoRA to instruction fine-tuning, context-grounded fine-tuning, reinforcement tuning, or preference tuning for parameter-efficient training. This combination provides the benefits of specialized training with reduced computational costs.
Context-grounded + retrieval system Pair context-grounded fine-tuning with robust retrieval infrastructure. The fine-tuning optimizes how models use retrieved context; the retrieval system provides current information.
Vision language tuning + instruction fine-tuning Use instruction fine-tuning with a vision-language model and image-text datasets. The training method is the same; the modality expands to include visual inputs alongside text.
Method comparison
| Consideration | Instruction | Context-grounded | LoRA | Reinforcement tuning | Preference tuning |
|---|---|---|---|---|---|
| Best for | Static domain knowledge | Dynamic information | Efficient training | Quality alignment (verifiable) | Quality alignment (subjective) |
| Knowledge location | Model parameters | External knowledge base | Adapter parameters | Model parameters | Model parameters |
| Update mechanism | Retrain model | Update knowledge base | Swap adapters | Retrain model | Retrain model |
| Retrieval required | No | Yes | Depends on base method | No | No |
| Training data | QA pairs | QA pairs + knowledge base | Depends on base method | Prompts + reference answers | Preferred/rejected response pairs |
| Training cost | High (full) | High (full) | Low | High (full) | High (full) |
| Iteration speed | Slow | Slow | Fast | Slow | Slow |
| Artifact size | Large (full model) | Large (full model) | Small (adapter) | Large (full model) | Large (full model) |
Getting started
Most fine-tuning projects should:
- Start with LoRA + instruction fine-tuning as the default approach.
- Use context-grounded fine-tuning if information requires frequent updates.
- Add reinforcement tuning or preference tuning as a refinement step if quality alignment is needed.
- Consider full fine-tuning only after LoRA experiments show it's necessary.
This progression balances training efficiency with model quality while minimizing upfront investment.
Updated 27 days ago
