A bug bounty aimed at biology risk
OpenAI has opened applications for a new GPT-5.5 Bio Bug Bounty, a targeted red-teaming program focused on whether researchers can discover a universal jailbreak that defeats the company’s biology-related safeguards. The structure is unusually specific. Participants are being asked to produce a single prompt that can successfully answer all five questions in OpenAI’s bio safety challenge from a clean chat without triggering moderation. The top reward is $25,000 for the first true universal jailbreak that clears all five.
The program, as described in the supplied source text, applies to GPT-5.5 in Codex Desktop only. Applications opened on April 23, 2026, with rolling acceptances through June 22, 2026. Testing is scheduled to begin April 28 and run through July 27. OpenAI says smaller awards may be granted for partial successes at its discretion.
This matters because it shows a frontier AI company treating biology misuse not only as a policy concern but as a concrete system-hardening problem. Rather than framing safety evaluation solely through internal review or general policy language, the company is inviting outside specialists to attack a narrowly defined failure mode.
Why a universal jailbreak matters
Most prompt-based safety failures are situational. A model may resist one phrasing but fail under another. A universal jailbreak is different because it suggests a more general weakness in the safety stack. If a single reusable prompt can bypass protective behavior across multiple dangerous prompts from a fresh conversation, that raises the seriousness of the vulnerability substantially.
OpenAI’s choice to center the challenge on a five-question bio safety test implies a threshold-based approach: the company is less interested in isolated edge cases than in systematic failures that would undermine confidence in the model’s biology defenses. By rewarding a universal method rather than scattered examples, it is asking red-teamers to probe the integrity of the overall alignment layer.
The reward size also signals priority. A $25,000 prize is modest relative to the scale of major software vulnerability programs, but substantial enough to attract credible specialists in AI security and biosecurity. More importantly, it clarifies that OpenAI is willing to pay for evidence that its safeguards can be broken under controlled conditions before those weaknesses are exploited elsewhere.
A selective, high-trust process
The program is not fully open. According to the supplied source, OpenAI will invite a vetted list of trusted bio red-teamers and review new applications from researchers with experience in AI red teaming, security or biosecurity. Accepted participants and collaborators must have existing ChatGPT accounts and sign a nondisclosure agreement. All prompts, completions, findings and communications are covered by NDA.
That controlled-access design reflects the sensitivity of the subject matter. Biology-related misuse research occupies an unusual position: the systems need to be stress-tested, but broad public release of adversarial methods could create additional risk. The NDA requirement suggests OpenAI is trying to balance external scrutiny with operational containment.
The setup also underlines a larger shift in frontier AI governance. High-risk capability domains are increasingly being handled through trusted-access models rather than purely open competitions. That approach limits outside visibility, but it can also enable more realistic adversarial testing than a fully public challenge would allow.
What the program says about frontier-model safety
The GPT-5.5 Bio Bug Bounty arrives as evidence that AI companies are moving toward more specialized safety validation for advanced systems. General-purpose red teaming remains important, but the highest-risk areas increasingly require domain-specific expertise. Biology is an especially important case because the line between legitimate scientific assistance and potentially dangerous information can be difficult to manage at scale.
By narrowing the challenge to universal jailbreaks, OpenAI is effectively asking a hard question about robustness: can its safeguards withstand a determined, expert adversary using prompt-based methods alone? That is more demanding than asking whether ordinary users can occasionally confuse the model. It is a test of whether the defenses fail in a repeatable, scalable way.
The company’s wording also suggests this program is part of a broader architecture of bug bounties and safety work. The source text points participants toward OpenAI’s separate safety and security bounty programs, which indicates a layered model of evaluation rather than a one-off exercise.
The limits of what this reveals
At the same time, the announcement leaves some things unclear by design. Because the challenge is covered by NDA, outside observers will not automatically see the prompts tested, the completions produced or the exact character of any successful jailbreaks. That reduces transparency, though it may be unavoidable in a domain where publication itself could create risk.
The focus on Codex Desktop also narrows the scope. A model’s safety posture can vary across products, interfaces and deployment constraints. Success or failure in one environment does not necessarily describe every environment. Still, as the supplied source makes clear, the company is explicitly putting GPT-5.5’s biology safeguards under adversarial pressure in at least one real product context.
A practical turn in AI safety
The larger significance of the bug bounty is that it treats model safety as something that must be tested operationally, not just described in system cards or policy statements. In that sense, the program is less about marketing a safeguard than about inviting expert attempts to break it under rules that are narrow enough to be meaningful.
Whether OpenAI’s defenses hold up is a separate question. What is already clear is that the company sees biology-related misuse as important enough to warrant paid, targeted external attack. That is a notable development in its own right. As frontier AI systems become more capable, the credibility of safety claims will depend increasingly on adversarial testing programs like this one, where the standard is not whether a policy exists, but whether it survives contact with people trying to defeat it.
This article is based on reporting by OpenAI. Read the original article.
Originally published on openai.com








