Content moderation
Content moderation provides access to specialized models for evaluating text safety and brand risk. These models classify content across various safety categories, supporting use cases from general-purpose text moderation to domain-specific brand safety assessment.
Available models
SeekrFlow provides multiple content moderation models for different content types and contexts:
Seekr ContentGuard
ContentGuard is a purpose-built model for podcast transcript analysis. It evaluates diarized audio chunks for brand safety risk and civility scores.
Classification types:
- GARM brand safety (13 categories)
- Civility and hostility scoring
Best for: Podcast content moderation, brand safety assessment for spoken audio content
Meta Llama Guard 3
Llama Guard 3 is a general-purpose text moderation model supporting all content types. It classifies text using the MLCommons 22-category taxonomy.
Classification types:
- Safe/unsafe binary classification
- MLCommons taxonomy (22 categories)
Best for: LLM output filtering, agent guardrails, general text moderation
Use cases
Content moderation supports several workflows:
LLM guardrails – Filter agent outputs and user inputs for safety violations
Brand safety – Assess content suitability for brand association and advertising
Compliance – Screen content for regulatory requirements and platform policies
User protection – Detect harmful content in user-generated text
Model selection
Choose models based on content type and classification needs:
| Content type | Recommended model |
|---|---|
| Podcast transcripts | Seekr ContentGuard |
| LLM outputs | Meta Llama Guard 3 |
| User-generated text | Meta Llama Guard 3 |
| Brand safety scoring | Seekr ContentGuard |
Integration
Content moderation models are deployed as endpoints and accessed through the SeekrFlow API. Models accept text input and return classification results with category labels and confidence scores.
