> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# File Storage

> Upload, organize, and manage the source documents and system-generated files used across SeekrFlow.

File Storage is where you upload and manage all source documents used throughout the platform, including files for training data, vector stores, and agents.

## Supported file types

Accepted formats depend on the file purpose selected at upload.

**Raw Data**

| File type | Notes                                                                                                                      |
| --------- | -------------------------------------------------------------------------------------------------------------------------- |
| `.pdf`    | Converted to Markdown when used in AI-Ready Data or vector store workflows                                                 |
| `.docx`   | Converted to Markdown when used in AI-Ready Data or vector store workflows                                                 |
| `.ppt`    | Converted to Markdown when used in AI-Ready Data or vector store workflows                                                 |
| `.md`     | Used as-is                                                                                                                 |
| `.json`   | Must follow SeekrFlow's tree-like JSON structure – see [JSON formatting requirements](#json-formatting-requirements) below |

**Fine-tuning Data (Instruction Tuning Data and Reinforcement Tuning Data)**

| File type  | Notes      |
| ---------- | ---------- |
| `.jsonl`   | Used as-is |
| `.parquet` | Used as-is |

Upload limits: 20 files at a time, maximum 150 MB per file. Files in unsupported formats or exceeding size limits are rejected with an error.

## Upload files

<Steps>
  <Step>
    Navigate to **Data Engine > Storage**, then select the **Files** tab.
  </Step>

  <Step>
    Click **Upload Files**.
  </Step>

  <Step>
    Select a file purpose:

    * **Raw Data** – unstructured data used to power vector stores or generate AI-ready fine-tuning data
    * **Instruction Tuning Data** – data with instructions and responses for domain-specific fine-tuning
    * **Reinforcement Tuning Data** – data for reinforcement-based fine-tuning to optimize outputs against reference answers
  </Step>

  <Step>
    Click **Next**, then drag and drop your files or browse your device to select them.
  </Step>

  <Step>
    SeekrFlow validates the files and shows any issues.
  </Step>

  <Step>
    Click **Upload Files** to confirm.
  </Step>
</Steps>

Your uploads begin immediately and appear in the file list.

## File list

Each file in storage displays:

* **File name**

* **Source type** – how the file entered the system:

  * `Uploaded` – manually uploaded by the user
  * `Converted` – created via ingestion from a PDF, DOCX, or PPT
  * `Generated` – created by a SeekrFlow process, such as an alignment parquet file

* **File type**

* **Date added**

## File details

Click any file to open its details panel, which shows:

* **ID**
* **File size**
* **File purpose** – the purpose selected at upload (Raw Data, Instruction Tuning Data, or Reinforcement Tuning Data)
* **Source type**
* **Date added**
* **Used by** – any deployments, fine-tuning jobs, AI-Ready Data jobs, or vector stores that reference this file
* **Delete** – remove the file if it is not actively used elsewhere

## System-generated files

SeekrFlow automatically adds files to your storage as part of platform workflows:

* **Converted markdown files** – When a PDF, DOCX, or PPT is used in a workflow requiring ingestion (AI-Ready Data or vector stores), SeekrFlow saves the converted `.md` output with source type `Converted`.
* **Parquet datasets** – When an AI-Ready Data job completes, the generated parquet file is saved with source type `Generated`.

Generated and converted files behave like any other file and can be reused in future jobs, vector stores, or fine-tuning workflows.

## File formatting requirements

Before uploading, ensure your files are properly formatted.

**PDF and DOCX** – Use clear headings and ensure text content is structured logically for conversion. Avoid images without surrounding context.

**Markdown** – Use correct header hierarchy (`# H1`, `## H2`, `### H3`, etc.), limit headers to six levels, and ensure all sections have meaningful content. Avoid empty or skipped header levels.

<Info>
  Google Docs users can export files as Markdown, PDF, or DOCX.
</Info>

### JSON formatting requirements

JSON files must follow a tree-like structure where each node represents a section of content and may contain nested child sections.

Required fields for each node:

* **`label`** – A short title or question for the section.
* **`content`** – The main body text for the section.
* **`children`** – An array of child objects with the same structure. Use an empty array (`[]`) if there are no children.
* **`level`** – The depth of the section in the hierarchy: `"0"` for root, `"1"` for first-level child, and so on.

<CodeGroup>
  ```json JSON theme={null}
  {
    "label": "Seekr Ingestion Rules",
    "content": "This document explains seekr ingestion...",
    "children": [
      {
        "label": "JSON Example",
        "content": "You can upload json files...",
        "children": [],
        "level": "1"
      }
    ],
    "level": "0"
  }
  ```
</CodeGroup>

## Next steps

* [Vector Stores](/flow/app/vector-stores) – Create semantic indexes from your uploaded files
* [AI-Ready Data](/flow/app/ai-ready-data) – Generate training datasets from your files
