Data engine workflows
Prepare and ingest files
Upload source documents (PDF, DOCX, PPT, Markdown) and convert them to Markdown via the ingestion API.
Monitor ingestion
Track ingestion progress through data job status, per-file records, and timeline events.
Create and populate a vector database
Set up a vector database and run document ingestion to generate embeddings for semantic search and retrieval.
Create instruction fine-tuning data
Use a principle_files data job to generate a QA pair dataset for instruction fine-tuning.
Create context-grounded fine-tuning data
Use context_grounded_files or context_grounded_vector_db data jobs to generate training data grounded in an existing knowledge source.
Manage data jobs
List, filter, update metadata, and cancel data jobs.