OpenAI Integrates Moderation Scores Directly into Generation APIs

OpenAIOpenAI

OpenAI now provides moderation scores directly within its Responses API and Completions API. This allows developers to get safety signals for both input and generated content in a single request, simplifying the integration of content policies into AI applications.

OpenAI has integrated moderation scores into its Responses API (a new API primitive for agents) and Completions API (standard for text generation). Developers can now include a moderation object in their generation requests to receive signals for both the input and the generated output. This functionality uses the omni-moderation-latest model, which is designed to classify harmful content in both text and images.
Moderation Model
`omni-moderation-latest`
Supported Inputs
Text, Images
Image File Size Limit
20 MB
Moderation Endpoint Cost
Free
Streaming Behavior
Scores arrive after full output
Tool-Calling Coverage
Tool-call arguments, tool outputs in conversation content

This update streamlines AI safety and guardrails implementation by providing immediate feedback alongside generated content. Developers no longer need to make separate calls to a moderation endpoint, enabling quicker decisions on logging, routing for human review, or blocking outputs. This integration simplifies the process of enforcing application policies.

Developers can use these inline scores to enforce their application's content policy, such as filtering outputs or flagging content for review. The moderation endpoint itself is free to use, and the omni-moderation-latest model supports various harm categories for text and images, with image files up to 20 MB. For streaming responses, moderation scores are provided after the full output is available.

OpenAI Developers
OpenAI Developers
@OpenAIDevs
X

Moderation scores are now available in the Responses API and Completions API. Return moderation signals in the same request flow as generation, then decide how your app uses them for logging, routing, review, or blocking. https://t.co/0FMSLek2je

13retweets250likes
View on X

Still wondering? A few quick answers below.

Moderation scores are signals that indicate the presence of potentially harmful content in text or images. They include a `flagged` status, specific `categories` of harm detected, and `category_scores` representing the model's confidence for each category.

Moderation scores are now available directly within OpenAI's Responses API and Completions API. This allows developers to receive these safety signals as part of their content generation requests, streamlining the moderation workflow.

The `omni-moderation-latest` model accepts both text and image inputs for moderation. It can detect various harm categories, with some categories supporting both text and images (e.g., `violence`, `self-harm`) and others being text-only (e.g., `harassment`, `hate`).

Developers can use moderation scores to enforce their application's content policies. This includes logging flagged content, routing it for human review, or blocking it entirely. Receiving scores inline with generation simplifies the integration of safety checks into AI-powered applications.

Yes, the moderation endpoint itself is free to use. This allows developers to implement content safety measures without incurring additional costs for standalone moderation requests.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Share this update