Supported file types
Accepted formats depend on the file purpose selected at upload. Raw Data| File type | Notes |
|---|---|
.pdf | Converted to Markdown when used in AI-Ready Data or vector store workflows |
.docx | Converted to Markdown when used in AI-Ready Data or vector store workflows |
.ppt | Converted to Markdown when used in AI-Ready Data or vector store workflows |
.md | Used as-is |
.json | Must follow SeekrFlow’s tree-like JSON structure – see JSON formatting requirements below |
| File type | Notes |
|---|---|
.jsonl | Used as-is |
.parquet | Used as-is |
Upload files
Select a file purpose:
- Raw Data – unstructured data used to power vector stores or generate AI-ready fine-tuning data
- Instruction Tuning Data – data with instructions and responses for domain-specific fine-tuning
- Reinforcement Tuning Data – data for reinforcement-based fine-tuning to optimize outputs against reference answers
File list
Each file in storage displays:- File name
-
Source type – how the file entered the system:
Uploaded– manually uploaded by the userConverted– created via ingestion from a PDF, DOCX, or PPTGenerated– created by a SeekrFlow process, such as an alignment parquet file
- File type
- Date added
File details
Click any file to open its details panel, which shows:- ID
- File size
- File purpose – the purpose selected at upload (Raw Data, Instruction Tuning Data, or Reinforcement Tuning Data)
- Source type
- Date added
- Used by – any deployments, fine-tuning jobs, AI-Ready Data jobs, or vector stores that reference this file
- Delete – remove the file if it is not actively used elsewhere
System-generated files
SeekrFlow automatically adds files to your storage as part of platform workflows:- Converted markdown files – When a PDF, DOCX, or PPT is used in a workflow requiring ingestion (AI-Ready Data or vector stores), SeekrFlow saves the converted
.mdoutput with source typeConverted. - Parquet datasets – When an AI-Ready Data job completes, the generated parquet file is saved with source type
Generated.
File formatting requirements
Before uploading, ensure your files are properly formatted. PDF and DOCX – Use clear headings and ensure text content is structured logically for conversion. Avoid images without surrounding context. Markdown – Use correct header hierarchy (# H1, ## H2, ### H3, etc.), limit headers to six levels, and ensure all sections have meaningful content. Avoid empty or skipped header levels.
Google Docs users can export files as Markdown, PDF, or DOCX.
JSON formatting requirements
JSON files must follow a tree-like structure where each node represents a section of content and may contain nested child sections. Required fields for each node:label– A short title or question for the section.content– The main body text for the section.children– An array of child objects with the same structure. Use an empty array ([]) if there are no children.level– The depth of the section in the hierarchy:"0"for root,"1"for first-level child, and so on.
Next steps
- Vector Stores – Create semantic indexes from your uploaded files
- AI-Ready Data – Generate training datasets from your files