> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# AI-ready data

> Generate structured training datasets from raw content for model fine-tuning.

export const SupportedOn = ({ui = false, api = true, sdk = true}) => <div className="inline-flex flex-wrap items-center gap-x-5 gap-y-2 px-4 py-2.5 rounded-lg border border-[#00dad3] bg-[#00dad3]/10 text-sm not-prose">
    <span className="font-bold text-black dark:text-white whitespace-nowrap">
      Supported on
    </span>
    <div className="flex items-center gap-5">
      <span className="inline-flex items-center gap-1.5 font-semibold text-black dark:text-white">
        <Icon icon={ui ? "circle-check" : "circle-xmark"} color={ui ? "#00dad3" : "#9ca3af"} size={16} />
        UI
      </span>
      <span className="inline-flex items-center gap-1.5 font-semibold text-black dark:text-white">
        <Icon icon={api ? "circle-check" : "circle-xmark"} color={api ? "#00dad3" : "#9ca3af"} size={16} />
        API
      </span>
      <span className="inline-flex items-center gap-1.5 font-semibold text-black dark:text-white">
        <Icon icon={sdk ? "circle-check" : "circle-xmark"} color={sdk ? "#00dad3" : "#9ca3af"} size={16} />
        SDK
      </span>
    </div>
  </div>;

<SupportedOn ui={true} api={true} sdk={true} />

AI-ready data generates and transforms datasets, converting raw content into training-ready formats. These jobs produce structured outputs optimized for model fine-tuning and alignment.

## How it works

The AI-ready data pipeline transforms uploaded files into structured training datasets:

<Steps>
  <Step>
    **File selection** – Choose source files from storage to use as training material
  </Step>

  <Step>
    **Dataset type selection** – Specify the type of dataset to generate based on fine-tuning method
  </Step>

  <Step>
    **Generation** – The system processes files and creates structured question-and-answer pairs or training examples
  </Step>

  <Step>
    **Output** – Generated datasets are saved in formats compatible with fine-tuning workflows
  </Step>
</Steps>

## Integration with fine-tuning

AI-ready datasets feed directly into fine-tuning workflows:

* **Instruction fine-tuning** – Standard instruction datasets train models on task-specific examples
* **Context-grounded fine-tuning** – Context-grounded datasets train models to use retrieval effectively
* **Dataset quality** – Higher quality source content and generation produces better fine-tuned models

The data engine automates the transition from raw files to training-ready datasets, reducing manual dataset preparation effort.

## Datasets for fine-tuning

AI-ready data supports multiple dataset formats aligned to fine-tuning methods:

### Instruction fine-tuning

Standard instruction datasets consist of traditional question-and-answer pairs aligned to task-specific instructions. Each example demonstrates how the model should respond to particular queries or prompts.

**Structure:**

* **Input** – The question, prompt, or instruction
* **Output** – The expected response or completion

**Use cases:**

* Teaching domain-specific knowledge
* Customizing response style and tone
* Training task-specific behaviors

These datasets are used with instruction fine-tuning to embed knowledge directly into model parameters.

### Context-grounded fine-tuning

Context-grounded datasets consist of question-and-answer pairs that reference source documents. Each example includes the query, relevant context from source files, and the correct response grounded in that context.

**Structure:**

* **Query** – The question or prompt
* **Context** – Relevant excerpts from source documents
* **Response** – Answer derived from the provided context

**Use cases:**

* Training models to use external knowledge bases effectively
* Teaching retrieval-aware response generation
* Building models that cite sources and stay grounded in provided information

These datasets are used with context-grounded fine-tuning to train models for retrieval-augmented generation workflows.

## Generation parameters

Dataset generation can be configured with parameters that control output characteristics:

* **Number of examples** – How many training pairs to generate from source content
* **Diversity settings** – Controls for question variety and coverage across source material
* **Quality filters** – Criteria for ensuring generated examples meet minimum standards

## Dataset quality

Generated datasets are optimized for training effectiveness:

* **Relevance** – Questions and answers are derived from actual source content
* **Consistency** – Output format matches fine-tuning requirements
* **Coverage** – Examples span the breadth of source material
* **Validation** – Generated datasets can be reviewed before use in training

## Dataset management

AI-ready datasets are managed alongside other data engine outputs:

* **Status tracking** – Monitor generation job progress
* **Review** – Inspect generated examples before fine-tuning
* **Versioning** – Maintain multiple dataset versions from the same sources
* **Export** – Download datasets in standard formats
