Deployments

Deployments host a model on dedicated compute infrastructure and make it available for inference requests and agent usage. For conceptual background, see Deployments.

Create a deployment

Base model

from seekrai import SeekrFlow
from seekrai.types.deployments import DeploymentType

client = SeekrFlow()

deployment = client.deployments.create(
    name="my-base-model-deployment",
    description="Base model deployment for inference.",
    model_type=DeploymentType.BASE_MODEL,
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    n_instances=1
)
print(f"Deployment ID: {deployment.id}")
print(f"Status: {deployment.status}")

Fine-tuned model

deployment = client.deployments.create(
    name="my-fine-tuned-deployment",
    description="Fine-tuned model deployment for inference.",
    model_type=DeploymentType.FINE_TUNED_RUN,
    model_id="ft-1234567890",
    n_instances=1
)
print(f"Deployment ID: {deployment.id}")
print(f"Status: {deployment.status}")

Parameters

ParameterRequiredDescription
nameYesA name for the deployment. Must be 5–100 characters.
descriptionYesA description of the deployment. Must be 5–1000 characters.
model_typeYesDeploymentType.BASE_MODEL for a base model or DeploymentType.FINE_TUNED_RUN for a fine-tuned model.
model_idYesThe model ID (base model name or fine-tuning job ID) to deploy.
n_instancesYesNumber of dedicated instances to provision. Must be between 1 and 50.

Deployment status

StatusDescription
PendingDeployment requested, infrastructure provisioning in progress.
ActiveServing inference traffic.
InactivePaused, not serving requests.
FailedError during startup or runtime.

Promote a deployment

Promote a deployment to make it active and ready to serve inference requests.

deployment = client.deployments.promote(deployment.id)
print(f"Status: {deployment.status}")

Demote a deployment

Demote a deployment to pause it without deleting the endpoint.

deployment = client.deployments.demote(deployment.id)
print(f"Status: {deployment.status}")

List deployments

deployments = client.deployments.list()
for d in deployments.data:
    print(f"{d.name} ({d.status}): {d.id}")

Retrieve a deployment

deployment = client.deployments.retrieve("<deployment-id>")
print(f"{deployment.name}: {deployment.status}")