The Controllability Question
As AI reasoning models grow more capable, a question has become central to safety research: can developers instruct these systems to control, alter, or hide their chain-of-thought reasoning? New research from OpenAI provides a definitive and reassuring answer — not easily, and that structural resistance is genuinely good news for AI transparency and oversight.
The research tested whether reasoning models could be prompted, fine-tuned, or instructed to suppress the reasoning steps they perform before generating final outputs. Findings suggest that reasoning models have deeply embedded reasoning behaviors that resist straightforward override — a property with significant implications for how we build and monitor trustworthy AI systems.
Reasoning models like OpenAI's o-series use extended thinking processes that appear as scratchpad-style output visible to users. The question of whether this visible reasoning accurately represents the model's internal computational process — and whether that process could be hidden or manipulated by bad actors or misaligned objectives — has been a live debate in AI safety circles for years.
What the Research Found
Researchers attempted to train versions of reasoning models that would either hide their chain of thought entirely or produce misleading reasoning traces while still arriving at correct final answers. What they found was that attempts to separate visible reasoning from underlying computation tended to degrade overall model performance. The reasoning process and output quality appear tightly coupled — you cannot easily remove one without damaging the other.
This coupling is a structural property of how these models learn to reason during training. They develop reasoning patterns integral to their problem-solving capability, not a detachable overlay that can be stripped away. Efforts to suppress this reasoning through instruction prompting or targeted fine-tuning were largely unsuccessful at producing models that were both reliably capable and reliably opaque about their reasoning process.
The research also explored whether models could be induced to use visible reasoning to deceive users — showing plausible-looking but fabricated reasoning steps while secretly following different internal logic. This scenario was extensively tested and found difficult to reliably produce in current model architectures, providing evidence that visible chain-of-thought does track real internal reasoning to a meaningful degree rather than being pure performance.




![[AI DAILY NEWS RUNDOWN] OpenAI's Phone, Home Data Centers, and PayPal AI Layoffs (May 06 2026) (via enoumen.substack.com)](https://substackcdn.com/image/fetch/$s_!609W!,w_1200,h_600,c_fill,f_jpg,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89fecd5e-a8bc-427c-b5e1-ebccac738ee6_3000x3000.jpeg)


