> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Data engine

> Transform raw content into structured, AI-ready datasets for training and retrieval.

export const SupportedOn = ({ui = false, api = true, sdk = true}) => <div className="inline-flex flex-wrap items-center gap-x-5 gap-y-2 px-4 py-2.5 rounded-lg border border-[#00dad3] bg-[#00dad3]/10 text-sm not-prose">
    <span className="font-bold text-black dark:text-white whitespace-nowrap">
      Supported on
    </span>
    <div className="flex items-center gap-5">
      <span className="inline-flex items-center gap-1.5 font-semibold text-black dark:text-white">
        <Icon icon={ui ? "circle-check" : "circle-xmark"} color={ui ? "#00dad3" : "#9ca3af"} size={16} />
        UI
      </span>
      <span className="inline-flex items-center gap-1.5 font-semibold text-black dark:text-white">
        <Icon icon={api ? "circle-check" : "circle-xmark"} color={api ? "#00dad3" : "#9ca3af"} size={16} />
        API
      </span>
      <span className="inline-flex items-center gap-1.5 font-semibold text-black dark:text-white">
        <Icon icon={sdk ? "circle-check" : "circle-xmark"} color={sdk ? "#00dad3" : "#9ca3af"} size={16} />
        SDK
      </span>
    </div>
  </div>;

<SupportedOn ui={true} api={true} sdk={true} />

The data engine transforms raw content into structured, AI-ready data. It manages the complete data lifecycle from file ingestion through preparation for training and retrieval workflows.

## Core capabilities

The data engine provides two primary functions:

<CardGroup>
  <Card title="Storage" icon="database" href="/flow/components/data-engine/storage">
    Storage manages raw content through file ingestion and vector database creation. Files are uploaded, processed, and organized for downstream use, with ingestion insights that provide real-time visibility into processing status and diagnostics (available through the API and SDK). Vector databases transform these files into searchable knowledge bases through document chunking and embedding generation.
  </Card>

  <Card title="AI-ready data" icon="sparkles" href="/flow/components/data-engine/ai-ready-data">
    AI-ready data generates and transforms datasets, converting raw content into training-ready formats. These jobs produce structured outputs optimized for model fine-tuning and alignment, including standard instruction datasets and context-grounded datasets.
  </Card>
</CardGroup>

## Data workflow

The typical data engine workflow:

<Steps>
  <Step>
    Upload raw content files to storage.
  </Step>

  <Step>
    Create vector stores for retrieval applications.
  </Step>

  <Step>
    Generate AI-ready datasets from selected files.
  </Step>

  <Step>
    Use outputs for fine-tuning, agent knowledge bases, or evaluations.
  </Step>
</Steps>

## Integration points

Data engine outputs integrate across SeekrFlow:

* **Fine-tuning** – AI-ready datasets feed model training pipelines
* **Agents** – Vector stores power FileSearch tool for knowledge retrieval
* **Evaluations** – Structured datasets support model testing and validation
