Every AI Frontier Model is Now a Cyber Threat. So What Can You Do About It?

In this blog

From Inside the Lab: What Anthropic's Researcher is Seeing
What it Costs to Attack you Now
The Top Three Consequences for how Enterprises Plan
This is Not A Hypothetical Threat
Preemption is a Viable Posture, and the Clock is Shorter
The Fallacy of Static Safety and the Architectural Answer for Defense

When Anthropic disclosed Claude Mythos Preview in April, the well-informed consensus and discussion was that more was coming. The only real debate was the timeline.

Lee Klarich at Palo Alto Networks called it: "Within six months, advanced AI models with deep cybersecurity capabilities will become commonplace."

A multi-org strategy briefing from CSA, SANS, and [un]prompted projected comparable offensive capabilities in other frontier models "within months" and in open-weight models "within six months to a year."

Heather Adkins, Google's former CISO, co-signed a public warning estimating autonomous vulnerability discovery and exploitation roughly six months out.

We got thirty days.

On May 6, 2026, the UK AI Safety Institute published its evaluation of OpenAI's GPT-5.5. The model hit a 71.4% pass rate on AISI's Expert-tier cyber suite, just ahead of Mythos at 68.6%. It also became the second model ever to complete "The Last Ones," AISI's 32-step simulated corporate intrusion that takes a human expert about 20 hours, solving it end-to-end in 2 of 10 attempts.

Those numbers are extraordinary on their own. But the finding I keep coming back to is buried in a single sentence at the end of the report. This insight changes how every enterprise should think about model releases going forward:

"If cyber-offensive skill is emerging as a byproduct of more general improvements in long-horizon autonomy, reasoning, and coding, we should expect further increases in cyber capability from models in the near future, potentially in quick succession."

AISI is not saying GPT-5.5 is a cyber model. They're saying that what we used to think of as a specialized capability—the kind that requires deliberate post-training and offensive datasets—is showing up on its own as a side effect of making models smarter at general work.

Read it once and the implication is straightforward. Read it twice and it's a little unsettling: every frontier model release is now also a cyber capability release, whether the lab intended it or not.

Mythos was the first proof. GPT-5.5 is the confirmation. The next confirmation will arrive in weeks, not quarters.

From Inside the Lab: What Anthropic's Researcher is Seeing

The clearest corroboration of the AISI thesis didn't come from a published evaluation. It came from inside Anthropic.

On Rubrik's Out of Band podcast, Nicole Perlroth sat down with Anthropic's Nicholas Carlini to discuss what Mythos and its successors actually do. Carlini sits on Anthropic's Safeguards and Frontier Red Team. Few people in the world have spent more time inside this capability and his account is the clearest first-hand data we have on the curve.

The progression he described maps the AISI thesis exactly.

"As a consequence of [the models] getting a lot better at code, we've watched the benchmarks go up on how well the models can find vulnerabilities. And it started at whatever, 13% of the time they could find vulnerabilities people knew existed and then went to 30%… and around November, it was like 60%."

From 13% to 30% to 60%. That isn't the result of a deliberate offensive cyber training program. It's the byproduct of Anthropic getting better at coding. AISI named the pattern. Carlini watched it climb.

He was just as direct on a question every executive should be asking: how much sophistication does it actually take to use this capability? His answer was almost none.

"The prompt is very, very simple… we tell the model, we want you to find a bug. This is the code base you're looking at. You have complete access to the machine. Go forth and find something for me… The models have gotten good enough now that you don't need to design fancy custom scaffolds."

That closes off the most common pushback I hear when frontier cyber capability comes up: "Sure, but only an expert can prompt it that way." According to the person doing the work inside Anthropic, the era of needing expert prompting is already behind us.

What it Costs to Attack you Now

To put a number on it: AISI gave GPT-5.5 a reverse-engineering challenge that Crystal Peak Security's expert human playtester took 12 hours to solve.

GPT-5.5 finished it in 10 minutes and 22 seconds, with no human assistance, at an API cost of $1.73.

That single number should send a chill down the spine of every executive. The economics of finding and exploiting vulnerabilities just collapsed by orders of magnitude. It didn't happen in a research lab. It happened in a commercially available API.

The Top Three Consequences for how Enterprises Plan

Most enterprise threat models quietly assume offensive cyber tooling has a supply chain. Someone has to build the exploit. There's a market for it, and the price of that market is what keeps sophisticated attacks rare. When cyber capability becomes a side effect of general intelligence, that supply chain becomes irrelevant.

If threats are near infinite and ubiquitous, how does enterprise IT prepare? Here are three things to consider:

1. The proliferation window is now measured in weeks: The aggressive predictions called for six months. We got thirty days. The next gap will likely be shorter and open-weight models will follow on a schedule no single vendor controls.

2. Frontier cyber capability is a near-term planning concern, not a future watch item: AISI explicitly notes that performance on its corporate-intrusion range keeps scaling with token budget. No plateau has been observed. The ceiling isn't visible yet.

3. Every general-capability release is now a cyber-capability release: Evaluating new models on the dimensions the lab markets is no longer enough. If autonomy, reasoning, and coding scores went up, offensive cyber went up too, whether the system card mentions it or not.

The thing to prepare for isn't a model. It's a cadence. Frontier capability is now a treadmill and offensive cyber moves with it.

This is Not A Hypothetical Threat

The evidence is already on the ground. Two examples from the last few months make the pattern concrete.

Palo Alto Networks' Unit 42 published research on Zealot, an autonomous cloud offensive multi-agent system built on LangGraph that breaks attacks into reconnaissance, exploitation, escalation, and exfiltration phases, then runs them autonomously against a Google Cloud environment. Zealot pulled off server-side request forgery, credential theft from cloud metadata services, service account impersonation, and BigQuery data exfiltration, all chained together with no human in the loop.

This was published before GPT-5.5 dropped. The agent-orchestration patterns required to weaponize a frontier model are already in the open-source ecosystem. The model has been the bottleneck. That bottleneck just moved.

In my last post, I argued that the real story behind Mythos wasn't AI-discovered zero-days. It was an identity-driven attack chain executed at machine speed: credential theft, lateral movement across Active Directory forests, a CI/CD supply-chain pivot, database exfiltration. The same playbook Scattered Spider runs by hand, now executable by a model. But there's a second half of this threat the industry is just starting to grapple with: your own AI agents are now sitting inside the blast radius of every one of these models.

This isn't speculative. In late 2025, Unit 42 disclosed an Amazon Bedrock AgentCore vulnerability they called "Agent God Mode." The platform's starter toolkit auto-creates an IAM execution role with a wildcard memory ARN (arn:aws:bedrock-agentcore:*:memory/*), giving every agent deployed through the toolkit cross-tenant access to other customers' agent memory stores. A flagship enterprise AI agent platform shipped with god-mode IAM as the default.

Multiply that by the agents you've already deployed across Copilot Studio, LangChain, Claude Code, and a dozen other platforms. Each one runs under an identity. Each identity has data and system access. A frontier model with an offensive cyber capability doesn't need to discover a novel vulnerability if it can compromise an identity that already commands a fleet of internal agents. The attacker brings the brain. Your environment provides the hands.

So here’s a new anatomy of a cyber attack: you have external offensive AI on one side and your own internal agentic surface area on the other, both meeting at the identity layer. GPT-5.5 makes the external side cheaper to operate. Your own AI rollout is making the internal side bigger every week.

Preemption is a Viable Posture and the Clock is Shorter

Gartner has been clear about where enterprise defense has to go. Cyber resiliency is no longer optional and preemptive cybersecurity (predictive threat intelligence, automated moving target defense, autonomous cyber immune systems) defends against AI-driven threats. Reactive playbooks built around detect-and-respond can't keep pace with what GPT-5.5 just demonstrated.

It's the through-line in everything we've written about Mythos so far. Bipul Sinha, CEO of Rubrik, made the case that prevention alone is finished and that preemption is the new game. Rubrik’s Chief Revenue Officer, Jesse Green, made the case that Mythos broke the 20-year detection-buys-you-time assumption the industry was built on. I argued that identity is where the real fight happens and that the standard "rebuild from a clean snapshot" approach creates its own crisis, the identity recovery paradox.

GPT-5.5 doesn't change any of those conclusions, but it shortens the clock on all of them.

The right metrics are no longer time-to-patch or time-to-detect. For data and identity, it's time-to-clean-state: how fast you can return to a known-good configuration after a compromise you may not have seen coming. For AI agents, where actions like sent emails, modified records, and triggered workflows can't be cleanly rewound, it's real-time visibility: how fast you can see what an agent is being asked to do and stop it before the action lands.

The architecture that gets you there isn't reactive backup-and-restore. It's preemptive: clean recovery points pre-computed during normal operations, identity changes continuously baselined, and agent actions monitored against intent in real time.

The corporate boards that will be operating well in 18 months are the ones whose teams can answer, in real time, questions about three key subjects (regardless of which frontier model just dropped):

Data: Where is our last known-clean recovery point and how fast can we land on it?
Identity: Can we rebuild our identity provider from a known-clean state, even when an AI-driven attacker has planted layers of persistence we can't fully see or trace, and then roll forward only the legitimate changes since?
Agents: Which AI agents run under which identities, what can each one touch, and how do we revoke or roll back agent actions when one is compromised?

If you can answer those questions, you are already aligned with where the threat model is going. For everyone else, GPT-5.5 just made the work more urgent than it was last month.

Carlini, asked on Rubrik's Out of Band podcast what advice he would give CISOs preparing for this curve, was pragmatic about the playbook.

"People need to just look very carefully at what exists in the world right now and try to use the tools in the best way that they can… It's relatively easy to swap in a new model if you have the framework set up. Making sure that people are starting to use these things in ways that they can for defense is probably the best case here."

Your defense doesn't need to predict which model ships next. It needs to hold up when an attacker swaps it in.

The Fallacy of Static Safety and the Architectural Answer for Defense

One more detail from the AISI report worth flagging: their red-teamers found a universal jailbreak of GPT-5.5's cyber safeguards in six hours of expert work. OpenAI's newly-formed Cyber Frontier Risk Council, a governance structure named in the GPT-5.5 system card, pushed updates. AISI was unable to verify the final mitigation due to a configuration issue.

That leads us to two takeaways:

Frontier labs are evolving safeguards quickly and standing up dedicated governance to keep up. That's a positive signal.
Vendor-side guardrails should be one layer in your defense, not the load-bearing control. The architectures that scale through the next several model releases will be the ones that treat the safeguards layer as dynamic and design around that.

The capability is real, it's reproducible across labs, and it's coming out of work nobody is going to stop doing. The honest good news is that we know what the architectural answer looks like—preemptive recovery, identity you can rebuild, governance over the agents you've actually deployed. All of that is buildable now, with what's on the market today. The enterprises that come out of this in the best shape are the ones already building it.

The line from Carlini that I keep replaying:

"If you plot how good these models are getting on essentially every metric, they're getting much better, much faster than we like to think. We've underestimated it… Every other time I have said that I don't believe [the next leap] will happen, it has then happened."

That's the person inside the lab telling the rest of us to widen our error bars. I'd take him at his word.

Threats now move at machine speed and so do model releases. There's no defender on the proliferation curve itself. The defense is in the architecture you build before the next model ships.

If you want to spend three days with the people building for this, including the CISOs, CIOs, and engineers who'll have to operate through the next several frontier model releases without flinching, join us at Rubrik Forward 2026, June 8-11 at The Venetian Resort in Las Vegas.

Sources: UK AI Safety Institute, "Our evaluation of OpenAI's GPT-5.5 cyber capabilities" (May 2026); AISI, "Our evaluation of Claude Mythos Preview's cyber capabilities" (April 2026); OpenAI GPT-5.5 System Card; Rubrik, Out of Band podcast (hosted by Nicole Perlroth) — "The Breaking Point: Inside Mythos' Zero-Day Machine with Anthropic's Nicholas Carlini"; Palo Alto Networks Unit 42, "Autonomous AI Cloud Attacks" (Zealot); Unit 42, "Cracks in the Bedrock: Agent God Mode"; Palo Alto Networks, "Defender's Guide to the Frontier AI Impact on Cybersecurity"; Anthropic Mythos Preview disclosure.

Products

Solutions

Knowledge Hub

About Us