Microsoft is pushing agentic AI into defensive security
Microsoft says it has built an AI-powered vulnerability discovery system that relies not on a single model, but on a coordinated swarm of specialized agents. The system, called MDASH, short for Multi-Model Agentic Scanning Harness, uses more than 100 agents to analyze software, argue over possible flaws, and attempt to validate whether suspected bugs can actually be exploited.
According to Microsoft, the approach has already delivered results inside one of the most difficult environments to audit: its own proprietary software stack. On Patch Tuesday, May 12, 2026, the company reported 16 Windows vulnerabilities discovered by MDASH in networking and authentication components. Four were classified as critical. The affected components included the
tcpip.sys kernel component, the IKEv2 service in
ikeext.dll,
netlogon.dll, and
dnsapi.dll.
A pipeline built for disagreement
The architecture described by Microsoft matters as much as the vulnerability count. MDASH operates in four stages. First, it analyzes source code and maps the attack surface. Then a set of auditor agents scans for suspicious patterns or risky code paths. In the third stage, another set of agents, described as debaters, argues for and against whether each finding is likely to be real and exploitable. Finally, so-called Evidence Leader agents attempt to trigger the issue using specific inputs.
That structure is meant to solve a familiar problem in automated security scanning: false positives. Security tools can generate large numbers of plausible but low-value alerts. By forcing specialized agents to challenge each other’s claims before moving to exploit attempts, Microsoft is presenting MDASH as a system that filters noise rather than simply amplifying it.
Why Microsoft thinks this approach is different
One of Microsoft’s arguments is that its own internal code base presents a particularly useful test. Windows, Hyper-V, and Azure are proprietary and therefore absent from public training data. That means the system cannot simply regurgitate memorized examples from open-source repositories. If it is finding real issues in closed code, Microsoft can reasonably claim that the system is performing analysis rather than retrieval.
The company also says the pipeline is model-agnostic. When a new model becomes available, it can be swapped into the configuration without redesigning the entire system. Experts can also add plugins containing domain-specific knowledge, such as kernel calling conventions or trust boundaries in inter-process communication, allowing the system to operate with technical context a general-purpose foundation model would not inherently possess.
What MDASH found
The company says MDASH uncovered 16 new vulnerabilities in the Windows networking and authentication stack. Ten of the 16 affect kernel mode, and most are accessible from the network without authentication. Those characteristics make the findings more serious than a routine bug list. Kernel vulnerabilities can create wide system impact, while remote network reachability raises the value of an exploit to attackers.
Microsoft classified four of the discovered flaws as critical. In security terms, that is the strongest practical argument for the system’s usefulness. A benchmark score can attract attention, but critical bugs in production software matter more.
Benchmark leadership, with caveats
Microsoft says MDASH scored 88.45% on the public CyberGym benchmark, the highest result reported so far. That gives the company a measurable claim to technical leadership in this emerging category of agentic security tooling. But the comparison is not entirely straightforward. Microsoft has not disclosed the exact models powering the system, and benchmark conditions do not always translate directly to the complexity of real-world software environments.
Even so, the result supports a broader trend. Security research is moving beyond single-shot prompting toward orchestrated systems in which multiple models or agents divide labor, critique one another, and iteratively test hypotheses. MDASH is part of that shift, and its design suggests Microsoft sees debate and verification, not just code summarization, as the key to practical automated security work.
Why this matters beyond Microsoft
If Microsoft’s account holds up, MDASH offers a preview of how enterprise security could change. Large vendors maintain immense code bases that are difficult for human teams to audit comprehensively. Agentic systems that can continuously scan, contest, and validate findings may become a force multiplier for internal security programs, especially where proprietary code prevents heavy reliance on public-data-trained models.
There is also an operational implication. Because the system is model-agnostic, improvements in the underlying models could compound quickly. A better language model would not need to replace the workflow; it could plug into an established pipeline that already knows how to distribute tasks and verify output.
For now, Microsoft’s strongest evidence is concrete: 16 reported Windows vulnerabilities, including four critical flaws, discovered by a multi-agent system it says can reason across closed-source software. The company has not revealed every implementation detail, and the wider industry will want more independent validation. But the signal is clear enough. AI vulnerability hunting is moving from demo-stage novelty toward production security engineering.
This article is based on reporting by The Decoder. Read the original article.
Originally published on the-decoder.com





