Creating and Deploying a Fine-Tuned Model in SeekrFlow™

Choose a model from our library of open-source models, configure and fine-tune it on a task or domain, monitor its progress, and validate its effectiveness.

This guide will walk you through how to choose a model to fine-tune with your data, how to monitor and validate the resulting model, and finally, how to promote it to our inference API, where you can test its effectiveness with chat completions.

Choosing a model to fine-tune

Selecting the right model for fine-tuning in a training run is a crucial decision that can significantly impact the performance and effectiveness of your model. Below is a full rundown of the pre-trained, open-source language models available for fine-tuning on the SeekrFlow platform:

Available models

Llama 2Llama 3Llama 3.1
Suited to basic tasks and simple applications.Text models optimized for speed and efficiency in standard tasks.Text models with enhanced reasoning and math abilities. Ideal for tasks like long-form content generation, document analysis, and extended conversation.
meta-llama/Llama-2-7b-chat-hf\meta-llama/Meta-Llama-3-8Bmeta-llama/Llama-3.1-8B
meta-llama/Llama-2-7b-hfmeta-llama/Meta-Llama-3-8B-Instructmeta-llama/Llama-3.1-8B-Instruct
meta-llama/Llama-2-13b-chat-hf
meta-llama/Llama-2-13b-hf

Which one should I use?

Generally, Chat models are designed for user/system conversations ("What is my deductible?"), while Instruct models are fine-tuned to follow instructions to complete a task ("Write a summary of the latest AI research papers dealing with context attribution in model output."). If you have a more specialized task or need to create a unique application that doesn't use any pre-existing conversation patterns, try the base model.

Smaller models generally cost less and train faster, while larger models will generally have better results. Consider your task complexity and performance requirements when making a model selection for fine-tuning.

If your goal is efficiency and targeted performance, consider using a smaller model.

  • Example: A bank builds a customer service chatbot for handling routine inquiries about account services. This specialized chatbot would be trained on banking-specific terminology and common customer questions, enabling it to provide precise, accurate responses. In this case, a small-model approach minimizes computational costs and ensures faster deployment.

If your goal is to perform complex tasks, like advanced natural language inference or code generation, consider using a larger model for higher accuracy and versatility.

  • Example: A specialized system that assists radiologists by detecting abnormalities like tumors, fractures, or other conditions with high accuracy would be more suited to a large model with the ability to outperform smaller models in identifying subtle details that might be missed by human experts or less sophisticated systems.

Specialized base models

These models are also available for specific use cases:

Llama GuardTinyLlama
An 8B Llama 3 safeguard model for classifying LLM inputs and responses.A compact 1.1B text model for lightweight experimentation.
meta-llama/Llama-Guard-3-8BTinyLlama/TinyLlama-1.1B-Chat-v1.0

Notes on efficiency

Data requirements

Some models require large amounts of data for effective fine-tuning, while others can achieve good performance with less data. Typically, the larger the model, the more extensive your dataset will need to be to fine-tune it effectively.

Note: SeekrFlow’s data alignment feature is particularly useful where data is limited.

Hardware selection

The right hardware can significantly impact the speed, performance, and cost of your training process. Here are several key considerations to keep in mind:

Compute power

Specialized training hardware (such as GPUs, HPUs, or TPUs) are typically preferred for deep learning tasks due to their parallel processing capabilities, which can significantly speed up training times compared to CPUs.

Memory capacity

Model Size: Larger models require more memory to store the model parameters and intermediate calculations. Ensure that the chosen hardware has sufficient memory to accommodate the model you are fine-tuning.

For example, NVIDIA's A100 offers up to 40GB, making it a cost-effective option for small-to-medium model training or inference. On the other end of the spectrum, NVIDIA's H100 offers up to 80GB and can handle training and inference for large LLMs, like GPT-4 or LLaMA 3.

Batch Size: Larger batch sizes can help speed up training but also require more memory. Balance batch size with available memory to optimize training efficiency.


Next

Create a fine-tuning job and monitor its learning progress