Content Moderation models

This page is about how to use SeekrFlow Content Moderation models.

Content Moderation Overview

This page provides a high-level overview of all content moderation models available in the SeekrFlow Model Library, including what each model can and cannot do, their ideal use cases, and how to get started.


Summary

SeekrFlow provides access to multiple content moderation models — each built for different content types, contexts, and levels of flexibility.

These models support classification of:

  • Podcast conversations (via diarized transcripts)
  • LLM outputs (chat, text, completions)
  • General user-generated content (comments, prompts, responses)

Not all models are multi-purpose.
Some, like Seekr ContentGuard, are highly domain-specific and require strict formatting. Others, like Meta Llama Guard, are flexible and general-purpose.


Available Moderation Models

1. Seekr ContentGuard

Purpose-built for: Podcast Transcript Scoring
This model is not general-purpose. It is specifically designed to analyze short, diarized, audio-derived chunks of podcast transcripts to assess:

  • GARM Brand Safety Risk (13 categories)
  • Civility / Hostility Score (attack type per chunk)

Use if you need to:

  • Score entire podcast episodes for brand suitability
  • Detect offensive tone in conversational spoken-word content

Do not use for:

  • Blog posts, LLM responses, web comments, or general text
  • Non-podcast content or full-episode input

Read Full Guide →


2. Meta Llama Guard 3 (8B)

General-purpose text moderation model
Designed by Meta, this model supports all types of text, including:

  • LLM prompts and completions
  • Chatbot messages
  • User input and responses
  • Web content, app text, comment sections

It classifies content using the MLCommons 22-category taxonomy and flags messages as either safe or unsafe for each category.

Use if you need to:

  • Add guardrails to LLMs or agents
  • Moderate general-purpose text input/output
  • Screen for safety risks across any user-generated content

Not suitable for:

  • Podcast-specific brand safety scoring
  • Civility / tone detection (does not include Seekr’s civility score)

Read Full Guide →


Choosing the Right Model

Use CaseRecommended Model
Podcast episode moderation (spoken audio)Seekr ContentGuard
Brand safety scoring for podcast contentSeekr ContentGuard
Civility / tone detection in podcastsSeekr ContentGuard
LLM output filteringMeta Llama Guard 3
Agent guardrails or safety classifiersMeta Llama Guard 3
Moderating general text, prompts, or chatsMeta Llama Guard 3

Comparison Table

Feature / CapabilitySeekr ContentGuardMeta Llama Guard 3
Hosted Model✅ Yes✅ Yes
Best forPodcast transcriptsGeneral-purpose text
Input RequiredShort, diarized audio chunksAny plain text
Supports GARM✅ Yes (13 categories)❌ No
Supports Civility scoring✅ Yes❌ No
Supports MLCommons Taxonomy❌ No✅ Yes (22 categories)
Safe/Unsafe Binary Labels❌ No✅ Yes
Works on blog/chat/LLM output❌ No✅ Yes
Multi-purpose❌ No✅ Yes

How to Get Started

  1. Visit the Model Library
  2. Deploy the moderation model that fits your use case
  3. Use the SDK or API to send content for classification
  4. Interpret the results (e.g., flag, block, score, analyze)
  5. Optionally build your own pipeline for ingestion, storage, and aggregation

Learn More