Choosing a fine-tuning method

Decision guide for selecting the right fine-tuning approach based on use case, requirements, and constraints.

Selecting the appropriate fine-tuning method depends on what knowledge the model needs to learn, how that knowledge changes over time, and operational constraints around cost and deployment.

Decision framework

What type of knowledge are you teaching?

Static domain knowledge or behaviors Use instruction fine-tuning or context-grounded fine-tuning.

  • Choose instruction fine-tuning when knowledge is stable and can be embedded directly into model parameters
  • Choose context-grounded fine-tuning when knowledge changes frequently and requires external knowledge bases

Quality preferences or alignment Use GRPO.

  • GRPO refines response quality when correct answers are hard to specify but quality differences are clear
  • Typically applied after instruction fine-tuning to align behavior with preferences

How frequently does the information change?

Rarely or never changes Use instruction fine-tuning.

  • Embeds knowledge directly into model parameters
  • No retrieval infrastructure required at inference time
  • Ideal for proprietary methodologies, stable domain expertise, or fixed compliance rules

Changes regularly Use context-grounded fine-tuning.

  • Teaches models to use external knowledge bases effectively
  • Update knowledge bases without retraining models
  • Ideal for current events, policy documents, or dynamic datasets

What are your cost and speed constraints?

Need fast iteration or have cost constraints Use LoRA with any method.

  • LoRA reduces training time and cost by updating only small adapter modules
  • Works as a modifier for instruction, context-grounded, or GRPO training
  • Start with LoRA; move to full fine-tuning only if LoRA cannot achieve desired behavior

Can invest in full training Use full fine-tuning.

  • Updates all model parameters for maximum flexibility
  • Consider when LoRA experiments show the task requires deep parameter changes
  • Produces larger artifacts and requires more compute resources

Do you need multiple task variants?

Yes – multiple related tasks Use LoRA.

  • Train multiple small adapters from a single base model
  • Swap adapters dynamically based on task
  • Cost-effective way to maintain task variants

No – single specialized model Use full fine-tuning.

  • Produces standalone model optimized for specific task
  • Simpler deployment without adapter management

Common combinations

Fine-tuning methods can be combined in training pipelines:

Instruction fine-tuning + GRPO First embed domain knowledge through instruction fine-tuning, then refine response quality and alignment through GRPO. This two-stage approach teaches what to know and how to respond.

LoRA + any method Apply LoRA to instruction fine-tuning, context-grounded fine-tuning, or GRPO for parameter-efficient training. This combination provides the benefits of specialized training with reduced computational costs.

Context-grounded + retrieval system Pair context-grounded fine-tuning with robust retrieval infrastructure. The fine-tuning optimizes how models use retrieved context; the retrieval system provides current information.

Method comparison

ConsiderationInstructionContext-groundedLoRAGRPO
Best forStatic domain knowledgeDynamic informationEfficient trainingQuality alignment
Knowledge locationModel parametersExternal knowledge baseAdapter parametersModel parameters
Update mechanismRetrain modelUpdate knowledge baseSwap adaptersRetrain model
Retrieval requiredNoYesDepends on base methodNo
Training costHigh (full)High (full)LowHigh (full)
Iteration speedSlowSlowFastSlow
Artifact sizeLarge (full model)Large (full model)Small (adapter)Large (full model)

Getting started

Most fine-tuning projects should:

  1. Start with LoRA + instruction fine-tuning as the default approach.
  2. Use context-grounded fine-tuning if information requires frequent updates.
  3. Add GRPO as a refinement step if quality alignment is needed.
  4. Consider full fine-tuning only after LoRA experiments show it's necessary.

This progression balances training efficiency with model quality while minimizing upfront investment.