OpenAI Open Sources Auto-review to Automate Safety Checks for Codex Agents

Maja Trebacz

May 4, 2026 · Updated Jun 5, 2026

OpenAI released the research and code for Auto-review, a secondary agent that handles permission requests for Codex without requiring human intervention. This architecture allows autonomous coding agents to perform sensitive tasks like network calls while maintaining safety oversight through a separate reasoning model.

OpenAI released the research and code for Auto-review, a safety architecture for Codex. It uses a secondary agent—powered by GPT-5.4 Thinking—to evaluate permission requests when the primary agent attempts actions outside its restricted sandbox (a secure, isolated environment). This model replaces the need for constant manual human approval.

Reviewer model: GPT-5.4 Thinking (low reasoning)
Human interruption reduction: 200x
Prompt injection recall: 99.3%
Overreach recall: 90.3%
Recovery rate after rejection: >50%
Availability: Open source (Codex repository)

This shift addresses the "approval bottleneck" that prevents unattended Codex agent workflows from completing long-running background tasks. It mirrors Claude Code's new auto mode, which also uses per-action safety classification. Delegating oversight to a separate model maintains safety standards without sacrificing the productivity of autonomous agents.

You can now access the Auto-review logic in the open-source Codex repository. Internal data shows the system catches 99.3% of prompt injections while reducing human interruptions by 200x. It provides a safer default for agents interacting with external networks or sensitive file systems without constant human oversight.

View the full update on alignment.openai.com

Maja Trebacz

@majatrebaczMay 1

Clicking the "Approve permission" button is difficult. We show that agents can do that for you. Check out our alignment blog: https://t.co/2jwUmSws8y https://t.co/9eEHfhA4TH

1199

View on X

Still wondering? A few quick answers below.

Auto-review is a safety architecture for OpenAI Codex that uses a secondary AI agent to approve or deny sensitive actions. It provides a middle ground between manual human approval, which is slow, and full access mode, which is risky. This system allows coding agents to perform complex tasks autonomously while maintaining security oversight.

When a Codex agent reaches a sandbox boundary—such as attempting a network call or running a script outside its designated folder—it requests escalation. A separate agent, powered by GPT-5.4 Thinking, reviews the request against security policies and user intent. This separation of roles ensures the safety check is independent of the task-oriented primary agent.

Yes, OpenAI has open-sourced the Auto-review logic within the official Codex repository on GitHub. The company released the code to encourage community scrutiny and iteration on agent oversight methods. This allows developers to implement similar agent-reviewing-agent safety patterns in their own agentic applications and research environments.

In internal evaluations, Auto-review achieved a 99.3% recall rate for detecting prompt injections and a 90.3% recall rate for agent overreach, which includes unintended data transfers or security weakening. It also successfully identified 96.1% of malicious actions in MonitoringBench, a dataset designed to test if agents can covertly carry out harmful tasks.

When Auto-review denies a request, it provides the primary agent with a rationale for the rejection. In more than half of internal cases, the Codex agent uses this feedback to find a safer alternative path to complete the task without human intervention. If the agent repeatedly attempts unsafe actions, the system automatically stops the session.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Keep reading

OpenAI Launches Auto-review to Enable Unattended Codex Agent Workflows

OpenAI introduced Auto-review, a new mode for Codex that delegates command approvals to a secondary safety agent. This shift allows the coding agent to execute long-running automations and builds without requiring constant human intervention for every step.

CursorMay 29

Cursor Auto-review Mode Uses Subagents to Cut Approval Fatigue

Cursor launched Auto-review, a new run mode that delegates tool call approvals to a specialized classifier subagent. By automating safety checks for terminal and API actions, the system allows coding agents to work longer without human interruption.

ClaudeMar 25

Claude Code Gets Auto Mode with Per-Action Safety Classification

Claude Code's new auto mode lets the coding agent make permission decisions autonomously, skipping manual approval on every action. A classifier reviews each tool call — safe ones proceed, risky ones get blocked. Research preview on the Team plan.

What is OpenAI Auto-review?

How does Auto-review work in Codex?

Is OpenAI Auto-review open source?

What are the safety benchmarks for Auto-review?

How does Auto-review handle rejected actions?

Keep reading

OpenAI Launches Auto-review to Enable Unattended Codex Agent Workflows

OpenAI Launches Auto-review to Enable Unattended Codex Agent Workflows

Cursor Auto-review Mode Uses Subagents to Cut Approval Fatigue

Cursor Auto-review Mode Uses Subagents to Cut Approval Fatigue

Claude Code Gets Auto Mode with Per-Action Safety Classification

Claude Code Gets Auto Mode with Per-Action Safety Classification

Keep reading

OpenAI Launches Auto-review to Enable Unattended Codex Agent Workflows

OpenAI Launches Auto-review to Enable Unattended Codex Agent Workflows

Cursor Auto-review Mode Uses Subagents to Cut Approval Fatigue

Cursor Auto-review Mode Uses Subagents to Cut Approval Fatigue

Claude Code Gets Auto Mode with Per-Action Safety Classification

Claude Code Gets Auto Mode with Per-Action Safety Classification