We're publishing a new evaluation suite and research paper on Chain-of-Thought (CoT) Controllability. We find that GPT-5.4 Thinking shows low ability to obscure its reasoning—suggesting CoT monitoring remains a useful safety tool. https://t.co/isZkNkPXZm
OpenAI Finds Reasoning Models Can't Hide Their Thinking, and That's Good
· Updated
OpenAI released a research paper and open-sourced
CoT-Control, an evaluation suite of over 13,000 tasks measuring whether reasoning models can deliberately shape or hide their chain-of-thought. Tested across 13 frontier models including GPT-5.4 Thinking, o3, and Claude Sonnet 4, controllability scores ranged from 0.1% to at most 15.4% — models often recognized they were failing to follow CoT instructions yet continued failing anyway.Low controllability is a feature, not a bug. Chain-of-thought monitoring is one of OpenAI's core safety layers, and models that cannot manipulate their own reasoning traces are easier to oversee. Controllability also decreases with longer reasoning and more RL training — meaning models that think harder are less capable of hiding their reasoning.
OpenAI plans to include CoT-Control scores in future model system cards, starting with GPT-5.4 Thinking. The suite is open-source — safety teams can run it to track how this capability evolves over time.
OpenAI
@OpenAI
299retweets
View on X





