OpenAI Releases Prompt-Based Teen Safety Policies for gpt-oss-safeguard

OpenAI

Mar 25, 2026 · Updated Jun 5, 2026

OpenAI released open-source, prompt-based teen safety policies for gpt-oss-safeguard, its 20B open-weight safety classifier. Developers building on open-weight models often start from scratch on safety rules — these policies provide a tested, extensible foundation covering six teen-specific risk categories.

OpenAI released open-source, prompt-based teen safety policies for gpt-oss-safeguard, its 20B open-weight safety classifier. The initial release covers six categories: graphic violent content, graphic sexual content, harmful body ideals and behaviors, dangerous activities and challenges, romantic or violent roleplay, and age-restricted goods and services. Structured as prompts, they feed directly into the classifier for real-time filtering or offline content analysis.

The core problem is the gap between high-level safety goals and the precise operational rules classifiers require. Teams building on open-weight models frequently start from scratch, leading to inconsistent enforcement. These definitions were developed with input from Common Sense Media and everyone.ai to reflect teens' distinct developmental needs.

Developers can pull these policies from the ROOST Model Community on GitHub and apply them to gpt-oss-safeguard. They're designed to be extended to new risk areas, translated into other languages, and layered with additional safeguards — not used as a final solution.

View the full update on openai.com

OpenAI Developers

@OpenAIDevsMar 24

We’re releasing prompt-based teen safety policies for gpt-oss-safeguard. They’re designed to help you identify and moderate teen-specific content, and turn safety requirements into classifiers for real-time filtering or offline analysis. https://t.co/t5i1CZNLnF

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenAI →

Keep reading

OpenAI Launches Safety Fellowship to Fund Independent Research on Advanced AI

OpenAI opened applications for its new Safety Fellowship, a five-month pilot program providing stipends and compute to external researchers. The initiative aims to build a pipeline of technical talent focused on critical areas like agentic oversight and safety evaluation for frontier models.

The OpenAI FoundationJun 2

OpenAI Foundation Deploys 130 Million Dollars to Build Global AI Resilience

The OpenAI Foundation has launched its AI Resilience vision with an initial $130 million in grants for critical safety infrastructure. The program funds defensive tools in cybersecurity, biological security, and youth safety to manage the risks of advancing frontier models.

Anthropic Publishes How Claude Handles Crisis Conversations and Reduces Sycophancy

AnthropicDec 18, 2025

Anthropic Publishes How Claude Handles Crisis Conversations and Reduces Sycophancy

Anthropic published evaluations of how Claude handles crisis conversations, sycophancy, and age restrictions. On crisis conversations, Claude 4.5 models respond appropriately 98.6% of the time and course-correct from problematic conversations 91% of the time, up from 36% with Opus 4.1.

NVIDIA Launches NemoClaw to Add Security and Privacy Controls to OpenClaw

NVIDIAMar 19

NVIDIA Launches NemoClaw to Add Security and Privacy Controls to OpenClaw

NVIDIA NemoClaw, an open-source stack announced at GTC, adds policy-based privacy and security guardrails to OpenClaw. It bundles Nemotron models and the OpenShell runtime in a single install for running safer, always-on AI agents locally.