> ## Documentation Index
> Fetch the complete documentation index at: https://docs.seekr.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Meta Llama-Guard model

> This page is about how to use the Llama Guard model in SeekrFlow.

## Full Model Guide: `meta-llama/Llama-Guard-3-8B`

### Summary

**Llama Guard 3 (8B)** is an open-source large language model from Meta designed for **content moderation and safety classification**. It evaluates user-generated text across categories such as hate speech, harassment, violence, and misinformation. Now available in the SeekrFlow model library, this model can be deployed to flag unsafe content in chat, social platforms, RAG pipelines, and agent interactions.

* **Moderation Type**: Text-based classification
* **Taxonomy**: MLCommons Responsible AI (22-category hazard set)
* **Usage Context**: Real-time inference or post-hoc content safety auditing
  **Helpful Links**:
  * 🔗 [Meta Model Card & Prompt Format](https://www.llama.com/model-cards-and-prompt-formats/llama-guard-3/)
  * 🔗 [Hugging Face Page](https://huggingface.co/meta-llama/Llama-Guard-3-8B)
  * 🔗 [SeekrFlow Model Library](https://apps.seekr.com/flow/model-library)

<Frame>
  <img src="https://mintcdn.com/seekr/xCjyiASvtfX59CGv/images/docs/c8f429244f5d2c2e234facfd258568b7483ba8632d5ddb5aa03bf27f10c77001-Screenshot_2025-07-29_at_3.53.16_PM.png?fit=max&auto=format&n=xCjyiASvtfX59CGv&q=85&s=6eec2df65b8564722192bd4ca1d23f6a" alt="" width="2768" height="1358" data-path="images/docs/c8f429244f5d2c2e234facfd258568b7483ba8632d5ddb5aa03bf27f10c77001-Screenshot_2025-07-29_at_3.53.16_PM.png" />
</Frame>

### Key Features

* **22 Risk Categories**\
  Based on MLCommons Responsible AI taxonomy (e.g., hate, violence, self-harm, sexual content, etc.)
* **Supports 1st or 2nd Person Framing**\
  Detects unsafe content from either user or assistant messages
* **Multi-language Capable**\
  While optimized for English, can moderately generalize to Spanish, French, German, etc.
* **Simple Prompt Format**\
  Uses a clean, JSON-like input for user and assistant messages
* **Open-source, Lightweight Model**\
  Uses the 8B Llama 3 base model for relatively fast, deployable inference

### Target Input Type

* Text-based user or assistant messages (chat-based systems, generative agents, Q\&A apps)
* Supports both **single message** and **multi-turn conversation** review
* Not built for audio or video moderation — **text only**

### Taxonomy Categories (MLCommons Hazard Set)

The model classifies text using the **MLCommons Responsible AI hazard taxonomy**, including:

* Hate
* Harassment
* Sexual content
* Self-harm
* Violence
* Criminal planning
* Weapons
* Drugs
* Alcohol
* Misinformation
* Health misinfo
* Legal misinfo
* Spam
* Profanity
* Insults
* Graphic content
* NSFW
* Solicitation
* Extremism
* Privacy violation
* Malicious code
* Financial harm
  Each category returns a **"safe"** or **"unsafe"** label.

### Languages Supported

* **Primary**: English
* **Partial Generalization**: Spanish, French, German, Portuguese, Italian, Dutch, etc.
* Model has not been fine-tuned for multi-language safety, so results may vary across languages.

### How to Use in SeekrFlow (Step-by-Step)

<Steps>
  <Step>
    **Deploy the Model**

    * Go to [SeekrFlow Model Library](https://apps.seekr.com/flow/model-library)
    * Deploy `meta-llama/Llama-Guard-3-8B`
    * Copy the model ID
  </Step>

  <Step>
    **Format Your Input**

    * Use the structured message format:

    ```json JSON theme={null}
    {
      "messages": [
        {"role": "user", "content": "Hey, you suck and I hate you."}
      ]
    }
    ```
  </Step>

  <Step>
    **Run Inference via SDK or API**

    * Pass the `messages` array with your deployed model ID
    * Results will return one or more `unsafe` category flags
  </Step>

  <Step>
    **Handle Unsafe Output**

    * Optional: Block, redact, route to human review, or rephrase based on categories flagged
  </Step>
</Steps>

### Model Prompt Format & Best Practices

**Input Format**

```json theme={null}
{
  "messages": [
    { "role": "user", "content": "..." },
    { "role": "assistant", "content": "..." }
  ]
}
```

* Can include just user input or both user + assistant responses
* Order matters; system assumes a back-and-forth conversation

**Output Format**

```json theme={null}
{
  "unsafe": true,
  "categories": {
    "hate": true,
    "violence": false,
    "harassment": true,
    ...
  }
}
```

**Best Practices**

* Keep messages short (single utterance or `<300 tokens`)
* Structure your input for clarity (don't mix system prompts with user messages)
* Use consistent formatting if auditing multi-turn chats

### What SeekrFlow Provides

* Fully hosted version of `Llama-Guard-3-8B`
* Fast, scalable API and SDK integration
* Outputs content moderation flags per message
* Access control, observability, and logging

### What You Can Build Separately

* Pipeline to generate or retrieve user messages
* Model inference logic (loop through messages, score each)
* Optional: dashboard, moderation UI, policy logic, score aggregation, redaction tools

### Summary Table

| Feature                         | Provided by SeekrFlow | Built by You          |
| ------------------------------- | --------------------- | --------------------- |
| Hosted model deployment         | ✅                     |                       |
| SDK & API access                | ✅                     |                       |
| Moderation scores (per message) | ✅                     |                       |
| Input structuring               |                       | ✅                     |
| UI or dashboard                 |                       | ✅                     |
| Logging and analytics           |                       | ✅ (optional)          |
| Moderation actions              |                       | ✅ (block, flag, etc.) |

### Limitations

* **No confidence scores** — only binary (safe/unsafe) output
* **English-centric** — lower accuracy in non-English text
* **Context window** is limited to \~8k tokens (short convos recommended)
  * Not fine-tuned for extremely short, isolated phrases
  * No audio, image, or video input support

### Use Cases

* Chat moderation (real-time or post-hoc)
* Filtering unsafe input/output from LLMs
* RAG system safety layer before retrieval/inference
* Agent communication review or agent guardrails
* Comment section moderation for news or community platforms

### FAQs

<AccordionGroup>
  <Accordion title="Q: Can I use Llama Guard 3 for non-English content?">
    A: It may work on major Western languages, but it's optimized for English. Accuracy may drop otherwise.
  </Accordion>

  <Accordion title="Q: Can I use this on a full conversation or only individual messages?">
    A: You can submit a full message history in `messages[]`, but it's best to keep it short for clarity and performance.
  </Accordion>

  <Accordion title="Q: Does it return a confidence score?">
    A: No, only binary `true/false` flags per category.
  </Accordion>

  <Accordion title="Q: Can I fine-tune or add my own taxonomy?">
    A: Not currently via SeekrFlow. You could host a modified version on your own infra if needed.
  </Accordion>

  <Accordion title="Q: How does this compare to Seekr ContentGuard?">
    A: Llama Guard is general-purpose and supports wide content types. Seekr ContentGuard is specifically tuned for podcast episodes and uses different scoring systems (GARM + Civility).
  </Accordion>
</AccordionGroup>
