Clicking the "Approve permission" button is difficult. We show that agents can do that for you. Check out our alignment blog: https://t.co/2jwUmSws8y https://t.co/9eEHfhA4TH
OpenAI Open Sources Auto-review to Automate Safety Checks for Codex Agents
· Updated
OpenAI released the research and code for Auto-review, a safety architecture for Codex. It uses a secondary agent—powered by
GPT-5.4 Thinking—to evaluate permission requests when the primary agent attempts actions outside its restricted sandbox (a secure, isolated environment). This model replaces the need for constant manual human approval.- Reviewer model
- GPT-5.4 Thinking (low reasoning)
- Human interruption reduction
- 200x
- Prompt injection recall
- 99.3%
- Overreach recall
- 90.3%
- Recovery rate after rejection
- >50%
- Availability
- Open source (Codex repository)
This shift addresses the "approval bottleneck" that prevents unattended Codex agent workflows from completing long-running background tasks. It mirrors Claude Code's new auto mode, which also uses per-action safety classification. Delegating oversight to a separate model maintains safety standards without sacrificing the productivity of autonomous agents.
You can now access the Auto-review logic in the open-source Codex repository. Internal data shows the system catches 99.3% of prompt injections while reducing human interruptions by 200x. It provides a safer default for agents interacting with external networks or sensitive file systems without constant human oversight.
Maja Trebacz
@majatrebacz
11retweets99likes
View on XStill wondering? A few quick answers below.
Auto-review is a safety architecture for OpenAI Codex that uses a secondary AI agent to approve or deny sensitive actions. It provides a middle ground between manual human approval, which is slow, and full access mode, which is risky. This system allows coding agents to perform complex tasks autonomously while maintaining security oversight.
When a Codex agent reaches a sandbox boundary—such as attempting a network call or running a script outside its designated folder—it requests escalation. A separate agent, powered by GPT-5.4 Thinking, reviews the request against security policies and user intent. This separation of roles ensures the safety check is independent of the task-oriented primary agent.
Yes, OpenAI has open-sourced the Auto-review logic within the official Codex repository on GitHub. The company released the code to encourage community scrutiny and iteration on agent oversight methods. This allows developers to implement similar agent-reviewing-agent safety patterns in their own agentic applications and research environments.
In internal evaluations, Auto-review achieved a 99.3% recall rate for detecting prompt injections and a 90.3% recall rate for agent overreach, which includes unintended data transfers or security weakening. It also successfully identified 96.1% of malicious actions in MonitoringBench, a dataset designed to test if agents can covertly carry out harmful tasks.
When Auto-review denies a request, it provides the primary agent with a rationale for the rejection. In more than half of internal cases, the Codex agent uses this feedback to find a safer alternative path to complete the task without human intervention. If the agent repeatedly attempts unsafe actions, the system automatically stops the session.

