File Storage

File Storage is where you upload and manage all source documents used throughout the platform, including files for training data, vector stores, and agents.

Supported file types

Accepted formats depend on the file purpose selected at upload. Raw Data

File type	Notes
`.pdf`	Converted to Markdown when used in AI-Ready Data or vector store workflows
`.docx`	Converted to Markdown when used in AI-Ready Data or vector store workflows
`.ppt`	Converted to Markdown when used in AI-Ready Data or vector store workflows
`.md`	Used as-is
`.json`	Must follow SeekrFlow’s tree-like JSON structure – see JSON formatting requirements below

Fine-tuning Data (Instruction Tuning Data and Reinforcement Tuning Data)

File type	Notes
`.jsonl`	Used as-is
`.parquet`	Used as-is

Upload limits: 20 files at a time, maximum 150 MB per file. Files in unsupported formats or exceeding size limits are rejected with an error.

Upload files

Navigate to Data Engine > Storage, then select the Files tab.

Click Upload Files.

Select a file purpose:

Raw Data – unstructured data used to power vector stores or generate AI-ready fine-tuning data
Instruction Tuning Data – data with instructions and responses for domain-specific fine-tuning
Reinforcement Tuning Data – data for reinforcement-based fine-tuning to optimize outputs against reference answers

Click Next, then drag and drop your files or browse your device to select them.

SeekrFlow validates the files and shows any issues.

Click Upload Files to confirm.

Your uploads begin immediately and appear in the file list.

File list

Each file in storage displays:

File name
Source type – how the file entered the system:
- Uploaded – manually uploaded by the user
- Converted – created via ingestion from a PDF, DOCX, or PPT
- Generated – created by a SeekrFlow process, such as an alignment parquet file
File type
Date added

File details

Click any file to open its details panel, which shows:

ID
File size
File purpose – the purpose selected at upload (Raw Data, Instruction Tuning Data, or Reinforcement Tuning Data)
Source type
Date added
Used by – any deployments, fine-tuning jobs, AI-Ready Data jobs, or vector stores that reference this file
Delete – remove the file if it is not actively used elsewhere

System-generated files

SeekrFlow automatically adds files to your storage as part of platform workflows:

Converted markdown files – When a PDF, DOCX, or PPT is used in a workflow requiring ingestion (AI-Ready Data or vector stores), SeekrFlow saves the converted .md output with source type Converted.
Parquet datasets – When an AI-Ready Data job completes, the generated parquet file is saved with source type Generated.

Generated and converted files behave like any other file and can be reused in future jobs, vector stores, or fine-tuning workflows.

File formatting requirements

Before uploading, ensure your files are properly formatted. PDF and DOCX – Use clear headings and ensure text content is structured logically for conversion. Avoid images without surrounding context. Markdown – Use correct header hierarchy (# H1, ## H2, ### H3, etc.), limit headers to six levels, and ensure all sections have meaningful content. Avoid empty or skipped header levels.

Google Docs users can export files as Markdown, PDF, or DOCX.

JSON formatting requirements

JSON files must follow a tree-like structure where each node represents a section of content and may contain nested child sections. Required fields for each node:

label – A short title or question for the section.
content – The main body text for the section.
children – An array of child objects with the same structure. Use an empty array ([]) if there are no children.
level – The depth of the section in the hierarchy: "0" for root, "1" for first-level child, and so on.

{
  "label": "Seekr Ingestion Rules",
  "content": "This document explains seekr ingestion...",
  "children": [
    {
      "label": "JSON Example",
      "content": "You can upload json files...",
      "children": [],
      "level": "1"
    }
  ],
  "level": "0"
}

Next steps

Vector Stores – Create semantic indexes from your uploaded files
AI-Ready Data – Generate training datasets from your files

​Supported file types

​Upload files

​File list

​File details

​System-generated files

​File formatting requirements

​JSON formatting requirements

​Next steps

Supported file types

Upload files

File list

File details

System-generated files

File formatting requirements

JSON formatting requirements

Next steps