Content moderation
Evaluate text safety and brand risk with specialized moderation models.
Content moderation provides access to specialized models for evaluating text safety and brand risk. These models classify content across various safety categories, supporting use cases from general-purpose text moderation to domain-specific brand safety assessment.
Available models
SeekrFlow provides multiple content moderation models for different content types and contexts:
Seekr ContentGuard
ContentGuard is a purpose-built model for podcast transcript analysis. It evaluates diarized audio chunks for brand safety risk and civility scores.
Classification types:
- GARM brand safety (13 categories)
- Civility and hostility scoring
Best for: Podcast content moderation, brand safety assessment for spoken audio content
Meta Llama Guard 3
Llama Guard 3 is a general-purpose text moderation model supporting all content types. It classifies text using the MLCommons 22-category taxonomy.
Classification types:
- Safe/unsafe binary classification
- MLCommons taxonomy (22 categories)
Best for: LLM output filtering, agent guardrails, general text moderation
Use cases
Content moderation supports several workflows:
LLM guardrails – Filter agent outputs and user inputs for safety violations
Brand safety – Assess content suitability for brand association and advertising
Compliance – Screen content for regulatory requirements and platform policies
User protection – Detect harmful content in user-generated text
Model selection
Choose models based on content type and classification needs:
| Content type | Recommended model |
|---|---|
| Podcast transcripts | Seekr ContentGuard |
| LLM outputs | Meta Llama Guard 3 |
| User-generated text | Meta Llama Guard 3 |
| Brand safety scoring | Seekr ContentGuard |
Integration
Content moderation models are deployed as endpoints and accessed through the SeekrFlow API. Models accept text input and return classification results with category labels and confidence scores.
Updated 8 days ago
