AI cyber capability is speeding past earlier projections
Help Net SecurityArchived May 14, 2026✓ Full text saved
AI cyber capability is improving faster than expected, with newer models surpassing earlier projections, according to the UK government’s AI Security Institute (AISI). AISI measures AI cyber capability using “time horizon benchmarks”, which estimate how long AI systems can complete cybersecurity tasks autonomously compared to human experts. “In February 2026, we estimated that frontier models’ 80%-reliability cyber time horizon had doubled every 4.7 months since reasoning models emerged in late
Full text archived locally
✦ AI Summary· Claude Sonnet
Sinisa Markovic, Senior Staff Writer, Help Net Security
May 14, 2026
Share
AI cyber capability is speeding past earlier projections
AI cyber capability is improving faster than expected, with newer models surpassing earlier projections, according to the UK government’s AI Security Institute (AISI).
AISI measures AI cyber capability using “time horizon benchmarks”, which estimate how long AI systems can complete cybersecurity tasks autonomously compared to human experts.
“In February 2026, we estimated that frontier models’ 80%-reliability cyber time horizon had doubled every 4.7 months since reasoning models emerged in late 2024, given a 2.5M token limit. This was around half our November 2025 doubling time estimate, which was 8 months for both 50% and 80% reliability,” AISI wrote in a blog post.
“Claude Mythos Preview and GPT-5.5 have since significantly outperformed this trend,” researchers added.
According to the institute, it remains unclear whether this represents “an isolated break from existing rates of progress or part of a new, faster trend.”
Researchers also said the latest frontier models are beginning to exceed the limits of the current cyber evaluation framework.
Claude Mythos Preview and GPT-5.5 achieved near-100% success rates on the longest tasks in the limited cyber test suite, even with a 2.5 million token limit applied to each task.
The institute noted that the benchmark becomes harder to measure once models consistently complete the most difficult tasks.
Removing the token cap would push success rates high enough that “time horizon” estimates could no longer be calculated reliably, researchers added.
“No single benchmark result should be read as a precise measure of AI capability,” they cautioned.
AISI noted that its cyber capability estimates are consistent with findings from METR, a nonprofit research group tracking AI performance on software engineering tasks. According to METR, AI software-engineering capability has been doubling roughly every 4.2 months since late 2024.
Researchers also evaluated frontier models in simulated enterprise attack environments known as cyber ranges. The tests measure whether AI systems can carry out longer multi-step intrusion operations after gaining initial access to a target network.
In the latest testing, Claude Mythos Preview became the first model to complete the two evaluated cyber ranges. The model solved “The Last Ones,” a 32-step simulated corporate network attack, in 6 out of 10 attempts and the previously unsolved “Cooling Tower,” a 7-step industrial control system attack, in 3 out of 10 attempts. GPT-5.5 completed “The Last Ones” in 3 out of 10 attempts.
“Frontier AI’s autonomous cyber and software capability is advancing quickly: the length of cyber tasks that frontier models can complete autonomously has doubled on the order of months, not years. What this evidence does not tell us is how the rate of progress will evolve, when AI will reach specific capability thresholds, or how these capabilities will perform against defended enterprise systems,” AISI concluded.
More about
AI
Anthropic
cybersecurity
enterprise
OpenAI
Share