We're publishing a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. We find that GPT-5.4 Thinking shows low ability to obscure its reasoning—suggesting CoT monitoring remains a useful safety tool. https://t.co/isZkNkPXZm
OpenAI Finds Reasoning Models Can't Hide Their Thinking, and That's Good
OpenAI· Updated
OpenAI released CoT-Control, an open-source evaluation suite that tests whether reasoning models can deliberately manipulate their chain-of-thought reasoning. Across 13 frontier models, controllability scores stay below 15.4%, meaning current AI systems can't effectively obscure their thinking from safety monitors.
CoT-Control, an evaluation suite of over 13,000 tasks measuring whether reasoning models can deliberately shape or hide their chain-of-thought. Tested across 13 frontier models including GPT-5.4 Thinking, o3, and Claude Sonnet 4, controllability scores ranged from 0.1% to at most 15.4% — models often recognized they were failing to follow CoT instructions yet continued failing anyway.Low controllability is a feature, not a bug. Chain-of-thought monitoring is one of OpenAI's core safety layers, and models that cannot manipulate their own reasoning traces are easier to oversee. Controllability also decreases with longer reasoning and more RL training — meaning models that think harder are less capable of hiding their reasoning.
OpenAI plans to include CoT-Control scores in future model system cards, starting with GPT-5.4 Thinking. The suite is open-source — safety teams can run it to track how this capability evolves over time.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →





