Release Notes | February 2026
by Brandi HopkinsSelf-hosted platform update
Updates for self-hosted SeekrFlow deployments on AWS, AWS GovCloud, and Appliance environments.
Agents and inference:
- AWS Bedrock inference support for agents with cloud service provider-hosted inference
- Full agent execution including tools, streaming, and promotion/demotion workflows
- Multi-LoRA inference with hot loading
vLLMupgrade for improved performance and stability- CSP inference routing to Bedrock endpoints based on region and model availability
Fine-tuning:
- LoRA support for reinforcement (GRPO) and instruction fine-tuning (NVIDIA-based)
Explainability:
- Inference-time data attribution
UI and documentation:
- Dark mode support
- Documentation updates covering new inference, agent, and fine-tuning capabilities
AWS Bedrock inference for agents
AWS Bedrock inference support for SeekrFlow Agents, providing an alternative to Seekr-hosted GPU inference. Agents can now use Claude 4.5 via AWS Bedrock for all intra-agent inference (planner, executor, evaluator, summarizer).
Capabilities:
- Full end-to-end agent execution including Agent Chat in the UI, streaming and non-streaming runs, and multi-tool execution
- Bedrock models appear in existing agent model-selection UI
- Configure at deploy time via Helm (EKS-only)
- Deployments select either Bedrock inference or Seekr-hosted GPU inference
Current limitations:
- Agents only (Model Chat is disabled when Bedrock is enabled)
- File search and vector-backed retrieval not yet supported for Bedrock agents
- Reasoning and speed optimization parameters are ignored to ensure consistent execution
Preference tuning (DPO)
Model alignment using direct preference optimization (DPO), training models from comparison data rather than ground truth or reward functions. Preference tuning trains models to increase the likelihood of generating preferred responses over rejected ones.
Capabilities:
- Upload preference datasets containing prompt, chosen response, and rejected response
- Automatic schema validation on upload
- Optional
betahyperparameter to control KL-divergence strength - Compatible with all base models supported by SeekrFlow
- Training loss tracking identical to supervised fine-tuning
Preference tuning supports alignment for subjective criteria like brand tone, compliance standards, and customer experience preferences where correctness is context-dependent.
Preference datasets must be user-provided. The AI-ready data engine does not yet generate preference datasets.
Reinforcement tuning reward functions
Configurable reward functions for reinforcement tuning jobs, enabling users to control which behaviors get reinforced during training. Reward functions are built from graders—individual scoring operations that can be used alone or combined into weighted, linear rewards.
Capabilities:
- Grader types: math accuracy, string check, and text similarity
- Reward composition: single grader or weighted linear combination of multiple graders
- Optional format reward weight to control
<think>/<answer>formatting enforcement - Built-in weight validation and normalization
This release also rebrands "GRPO" to "Reinforcement Tuning" in the API and SDK. Reward functions enable customers to explicitly reinforce task-specific objectives like correctness, policy adherence, or stylistic consistency, establishing the foundation for future evaluation platform capabilities.




