Core capabilities
The data engine provides two primary functions:Storage
Storage manages raw content through file ingestion and vector database creation. Files are uploaded, processed, and organized for downstream use, with ingestion insights that provide real-time visibility into processing status and diagnostics (available through the API and SDK). Vector databases transform these files into searchable knowledge bases through document chunking and embedding generation.
AI-ready data
AI-ready data generates and transforms datasets, converting raw content into training-ready formats. These jobs produce structured outputs optimized for model fine-tuning and alignment, including standard instruction datasets and context-grounded datasets.
Data workflow
The typical data engine workflow:Integration points
Data engine outputs integrate across SeekrFlow:- Fine-tuning – AI-ready datasets feed model training pipelines
- Agents – Vector stores power FileSearch tool for knowledge retrieval
- Evaluations – Structured datasets support model testing and validation