We’re releasing prompt-based teen safety policies for gpt-oss-safeguard. They’re designed to help you identify and moderate teen-specific content, and turn safety requirements into classifiers for real-time filtering or offline analysis. https://t.co/t5i1CZNLnF
OpenAI Releases Prompt-Based Teen Safety Policies for gpt-oss-safeguard
· Updated
OpenAI released open-source, prompt-based teen safety policies for gpt-oss-safeguard, its 20B open-weight safety classifier. The initial release covers six categories: graphic violent content, graphic sexual content, harmful body ideals and behaviors, dangerous activities and challenges, romantic or violent roleplay, and age-restricted goods and services. Structured as prompts, they feed directly into the classifier for real-time filtering or offline content analysis.
The core problem is the gap between high-level safety goals and the precise operational rules classifiers require. Teams building on open-weight models frequently start from scratch, leading to inconsistent enforcement. These definitions were developed with input from Common Sense Media and everyone.ai to reflect teens' distinct developmental needs.
Developers can pull these policies from the ROOST Model Community on GitHub and apply them to gpt-oss-safeguard. They're designed to be extended to new risk areas, translated into other languages, and layered with additional safeguards — not used as a final solution.
OpenAI Developers
@OpenAIDevs
21retweets
View on X





