OpenAI's Next Grand Challenge

OpenAI has announced a sweeping new research ambition: building what it calls an AI researcher — a fully automated, agent-based system capable of independently tackling large, complex scientific problems. In an exclusive interview with MIT Technology Review, Chief Scientist Jakub Pachocki described the initiative as OpenAI's North Star for the coming years, representing a convergence of the company's work on reasoning models, coding agents, and interpretability into a unified long-horizon goal.

The timeline is concrete and near-term in ways that distinguish this announcement from the more diffuse AGI promises the industry has traded in for years. OpenAI plans to build an autonomous AI research intern — a system capable of independently working on specific research problems for days at a time — by September 2026. The full multi-agent AI researcher, capable of tackling problems too large or complex for humans to manage, is targeted for a 2028 debut.

Codex as the Blueprint

Pachocki pointed to OpenAI's existing Codex agent as both the evidence base and the early prototype for the more ambitious AI researcher vision. Codex, which OpenAI released in January, is an agent-based coding system that can autonomously generate, run, and debug code to complete complex programming tasks. It has been broadly adopted within OpenAI itself, with Pachocki noting that most of the company's technical staff now use Codex as a core part of their workflow.

The philosophical leap Pachocki is making is that if an AI system can autonomously solve complex coding problems — which require creative reasoning, decomposition of large tasks into subtasks, tracking of complex state over extended work sessions, and error correction — then the same capability architecture can be extended to scientific problem solving in domains like biology, chemistry, physics, and mathematics.

Our jobs are now totally different than they were even a year ago. Nobody really edits code all the time anymore. Instead, you manage a group of Codex agents, Pachocki told MIT Technology Review. The vision is that the same management relationship — human directing, AI executing — could eventually apply to research itself, with scientists directing AI agents that independently pursue experimental hypotheses, review literature, design analyses, and generate results.

Why Now: The Reasoning Model Breakthrough

The renewed ambition for autonomous research capability is rooted in the emergence of so-called reasoning models — AI systems trained not just to produce outputs but to work through problems step by step, backtracking when they reach dead ends. Reasoning models have made AI systems qualitatively better at extended autonomous work: they can maintain coherent context over longer problem-solving sessions and catch and correct their own errors in ways that earlier language models could not.

OpenAI has also been feeding its training pipelines with complex task examples — hard puzzles from mathematics and programming competitions — that require the models to learn how to manage very large contexts, decompose problems into subtasks, and sustain effective reasoning over extended periods. Pachocki believes this training approach, combined with the general capability improvements between successive model generations, has brought the company to the threshold where autonomous research is achievable within the current development trajectory.

Recent results have given Pachocki's optimism some empirical grounding. OpenAI researchers have used GPT-5, the model that powers Codex, to discover new solutions to previously unsolved mathematics problems and make progress on specific puzzles in biology and physics — achievements that, while narrow, demonstrate that the model can generate genuinely novel scientific contributions rather than merely summarizing existing knowledge.

The Safety Problem at the Frontier

Pachocki did not shy away from the risks inherent in the vision he is pursuing. A fully autonomous AI researcher running with minimal human oversight represents a qualitatively new kind of AI capability — one that could generate scientific insights in domains like synthetic biology or advanced materials where the outputs could be weaponized or misused. Chain-of-thought monitoring, where AI systems' reasoning processes are made inspectable, is the primary safety approach Pachocki identified, but he acknowledged it is imperfect.

There are hard questions about where the lines should be drawn, Pachocki told MIT Technology Review. I think governments, not just OpenAI, will need to figure out where those lines are. The acknowledgment that external governance will be necessary is significant, reflecting a broader industry recognition that the autonomous research capability frontier requires regulatory frameworks that do not yet exist.

Competitive Context and Industry Implications

OpenAI's announcement comes as the company faces intensified competition from Anthropic and Google DeepMind on multiple fronts. Anthropic has gained significant ground in enterprise AI deployments, and DeepMind continues to produce influential scientific AI research through programs like AlphaFold. Positioning the autonomous AI researcher as OpenAI's defining next challenge is partly a competitive strategy — staking a claim to the most ambitious and consequential capability frontier before competitors can define the space.

For the broader scientific community, the prospect of AI systems that can autonomously advance research raises profound questions about the future of scientific labor, publication norms, intellectual property, and the allocation of credit for discoveries. If an AI agent independently discovers a new mathematical proof or identifies a novel therapeutic target, the ecosystem of academic publishing, grant funding, and research careers built around human scientific achievement will require substantial rethinking. OpenAI's 2028 timeline suggests these questions may need answers sooner than many in the scientific community have been preparing for.

This article is based on reporting by MIT Technology Review. Read the original article.