Content moderation

Evaluate text safety and brand risk with specialized moderation models.

Supported on
UI
API
SDK

Content moderation provides access to specialized models for evaluating text safety and brand risk. These models classify content across various safety categories, supporting use cases from general-purpose text moderation to domain-specific brand safety assessment.

Available models

SeekrFlow provides multiple content moderation models for different content types and contexts:

Seekr ContentGuard

ContentGuard is a purpose-built model for podcast transcript analysis. It evaluates diarized audio chunks for brand safety risk and civility scores.

Classification types:

  • GARM brand safety (13 categories)
  • Civility and hostility scoring

Best for: Podcast content moderation, brand safety assessment for spoken audio content

Meta Llama Guard 3

Llama Guard 3 is a general-purpose text moderation model supporting all content types. It classifies text using the MLCommons 22-category taxonomy.

Classification types:

  • Safe/unsafe binary classification
  • MLCommons taxonomy (22 categories)

Best for: LLM output filtering, agent guardrails, general text moderation

Use cases

Content moderation supports several workflows:

LLM guardrails – Filter agent outputs and user inputs for safety violations

Brand safety – Assess content suitability for brand association and advertising

Compliance – Screen content for regulatory requirements and platform policies

User protection – Detect harmful content in user-generated text

Model selection

Choose models based on content type and classification needs:

Content typeRecommended model
Podcast transcriptsSeekr ContentGuard
LLM outputsMeta Llama Guard 3
User-generated textMeta Llama Guard 3
Brand safety scoringSeekr ContentGuard

Integration

Content moderation models are deployed as endpoints and accessed through the SeekrFlow API. Models accept text input and return classification results with category labels and confidence scores.