Mozilla’s Firefox claim has sharpened an already tense AI security debate

Mozilla says Anthropic’s Mythos Preview model helped it identify 271 security vulnerabilities in Firefox 150 before the browser’s release, a result that immediately raises the stakes in the race to understand how advanced AI will affect cybersecurity.

The finding, reported by Ars Technica, adds unusually concrete evidence to a debate that has so far been driven largely by speculation, benchmark claims, and warnings from AI companies. Earlier in April, Anthropic said Mythos was so effective at discovering vulnerabilities that the company limited the model’s initial release to a small group of critical industry partners. Mozilla’s reported experience is now one of the clearest real-world signals of what that capability may look like in practice.

Firefox CTO Bobby Holley described the implications in sweeping terms, arguing that defensive security teams may finally be gaining an advantage. Even without detailed disclosure of the severity of the 271 flaws, the scale of the reported result is hard to ignore.

From dozens of bugs to hundreds in a single release cycle

The most striking comparison in the source report is not between AI and humans, but between one generation of AI models and the next. Holley said Anthropic’s Opus 4.6 model found 22 security-sensitive bugs when analyzing Firefox 148 last month. Mythos Preview, examining Firefox 150, reportedly surfaced 271 vulnerabilities.

If those figures are directly comparable, the jump is dramatic. It suggests that model progress in vulnerability analysis may not be linear. Even allowing for differences in target code or search conditions, moving from a few dozen to hundreds of findings in that short a span implies a meaningful change in capability.

The source report says the model found these issues simply by analyzing unreleased source code. That point matters because it frames the model not as an automated fuzzing engine requiring execution at scale, but as a reasoning system able to inspect codebases and flag likely vulnerabilities.

Holley compared the work to what could be done either through automated fuzzing or through elite human researchers reasoning through complex browser code. The practical difference, he argued, is cost and speed. If an AI model can find security flaws without many months of concentrated expert effort, defensive review becomes cheaper and more scalable.