File Storage

Upload, organize, and manage the source documents and system-generated files used across SeekrFlow.

File Storage is where you upload and manage all source documents used throughout the platform, including files for training data, vector stores, and agents.

Supported file types

Accepted formats depend on the file purpose selected at upload.

Raw Data

File typeNotes
.pdfConverted to Markdown when used in AI-Ready Data or vector store workflows
.docxConverted to Markdown when used in AI-Ready Data or vector store workflows
.pptConverted to Markdown when used in AI-Ready Data or vector store workflows
.mdUsed as-is
.jsonMust follow SeekrFlow's tree-like JSON structure – see JSON formatting requirements below

Fine-tuning Data (Instruction Tuning Data and Reinforcement Tuning Data)

File typeNotes
.jsonlUsed as-is
.parquetUsed as-is

Upload limits: 20 files at a time, maximum 150 MB per file. Files in unsupported formats or exceeding size limits are rejected with an error.

Upload files

  1. Navigate to Data Engine > Storage, then select the Files tab.
  2. Click Upload Files.
  3. Select a file purpose:
    • Raw Data – unstructured data used to power vector stores or generate AI-ready fine-tuning data
    • Instruction Tuning Data – data with instructions and responses for domain-specific fine-tuning
    • Reinforcement Tuning Data – data for reinforcement-based fine-tuning to optimize outputs against reference answers
  4. Click Next, then drag and drop your files or browse your device to select them.
  5. SeekrFlow validates the files and shows any issues.
  6. Click Upload Files to confirm.

Your uploads begin immediately and appear in the file list.

File list

Each file in storage displays:

  • File name
  • Source type – how the file entered the system:
    • Uploaded – manually uploaded by the user
    • Converted – created via ingestion from a PDF, DOCX, or PPT
    • Generated – created by a SeekrFlow process, such as an alignment parquet file
  • File type
  • Date added

File details

Click any file to open its details panel, which shows:

  • ID
  • File size
  • File purpose – the purpose selected at upload (Raw Data, Instruction Tuning Data, or Reinforcement Tuning Data)
  • Source type
  • Date added
  • Used by – any deployments, fine-tuning jobs, AI-Ready Data jobs, or vector stores that reference this file
  • Delete – remove the file if it is not actively used elsewhere

System-generated files

SeekrFlow automatically adds files to your storage as part of platform workflows:

  • Converted markdown files – When a PDF, DOCX, or PPT is used in a workflow requiring ingestion (AI-Ready Data or vector stores), SeekrFlow saves the converted .md output with source type Converted.
  • Parquet datasets – When an AI-Ready Data job completes, the generated parquet file is saved with source type Generated.

Generated and converted files behave like any other file and can be reused in future jobs, vector stores, or fine-tuning workflows.

File formatting requirements

Before uploading, ensure your files are properly formatted.

PDF and DOCX – Use clear headings and ensure text content is structured logically for conversion. Avoid images without surrounding context.

Markdown – Use correct header hierarchy (# H1, ## H2, ### H3, etc.), limit headers to six levels, and ensure all sections have meaningful content. Avoid empty or skipped header levels. Example Markdown files

ℹ️

Google Docs users can export files as Markdown, PDF, or DOCX.

JSON formatting requirements

JSON files must follow a tree-like structure where each node represents a section of content and may contain nested child sections.

Required fields for each node:

  • label – A short title or question for the section.
  • content – The main body text for the section.
  • children – An array of child objects with the same structure. Use an empty array ([]) if there are no children.
  • level – The depth of the section in the hierarchy: "0" for root, "1" for first-level child, and so on.
{
  "label": "Seekr Ingestion Rules",
  "content": "This document explains seekr ingestion...",
  "children": [
    {
      "label": "JSON Example",
      "content": "You can upload json files...",
      "children": [],
      "level": "1"
    }
  ],
  "level": "0"
}

Next steps