Storage Overview

The Storage section is the foundation of how SeekrFlow manages content. Whether you're uploading raw files, converting documents into training data, or powering retrieval for agents — all of it starts here.

Storage is split into two key areas:

📂 File Storage

The File Storage tab is where you upload, organize, and view every document you’ll use across SeekrFlow. It’s your single source of truth for raw inputs and system-generated outputs.

Key Features:

Upload up to 20 files at once, with a max file size of 150MB each
Supports common formats:
.pdf, .docx, .ppt, .md, .json, .jsonl, .parquet
Uploaded files are not immediately processed — they simply become available for use in downstream jobs

System-Generated Files:

When you ingest a PDF, DOCX, or PPT file in a workflow, a converted Markdown (.md) version will be saved back to File Storage (marked as Converted)
When you generate a Parquet file via a Standard Instruction (Principle Alignment) job, it’s saved here as well (marked as Generated)

Usage:

Once in File Storage, your files are selectable in:

AI-Ready Data Jobs (e.g., for creating training data)
Vector Stores (for semantic indexing)
Fine-Tuning (select Q&A Parquet outputs)

📘 Learn more on the File Storage page.

🧠 Vector Stores

A Vector Store is a semantic index that turns your documents into a format that language models can retrieve from. It’s what powers SeekrFlow’s advanced memory, grounding, and context retrieval workflows.

Key Features:

Create vector stores from one or more uploaded files
Select an embedding model and a chunking strategy to control how your data is processed
Add files by selecting from File Storage or uploading new ones on the spot
Upload limits: 20 files at a time, max 150MB per file, and must be one of:
.pdf, .docx, .ppt, .md, or .json

How It Works:

Files are ingested (if needed)
Content is chunked using intelligent or manual settings
Chunks are embedded using your selected model
Embeddings are indexed into the Vector Store

Once built, your Vector Store is ready to power:

Agents & Assistants with retrieval-augmented memory
Context Grounded AI-Ready Data Jobs (RAFT)
Fine-tuning pipelines based on retrieved context

📘 Learn more on the Vector Stores page.