Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
arXiv AIArchived Jun 08, 2026✓ Full text saved
arXiv:2606.07157v1 Announce Type: new Abstract: Many efforts to ensure frontier AI models are safe rely on monitoring their chain-of-thought (CoT) reasoning. If models become able to perform sufficiently complex reasoning internally, without explicit thinking tokens, this would undermine such oversight. We measure how well frontier models reason without CoT across a suite of over 30,000 questions spanning 43 benchmarks in domains including math, coding, puzzles, causality, theory-of-mind, and st
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Artificial Intelligence
[Submitted on 5 Jun 2026]
Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
Dewi Gould, Francis Rhys Ward, Anders Cairns Woodruff, Rauno Arike, Josh Hills, Alex Serrano, Ida Caspary, Jason Ross Brown, Jo J. Jiao, Patrick Leask, Twm Stone, Ram Potham, Ionut Gabriel Stan, Harry Mayne, Simeon Hellsten, Shubhorup Biswas, Ariana Azarbal, William L. Anderson, Elle Najt, Ryan Greenblatt, Julian Stastny
Many efforts to ensure frontier AI models are safe rely on monitoring their chain-of-thought (CoT) reasoning. If models become able to perform sufficiently complex reasoning internally, without explicit thinking tokens, this would undermine such oversight. We measure how well frontier models reason without CoT across a suite of over 30,000 questions spanning 43 benchmarks in domains including math, coding, puzzles, causality, theory-of-mind, and strategic reasoning. To compare models against humans, we estimate the 50\%-task-completion time horizon (TH): the human time required for tasks a model completes with 50\% success rate. We complement this with a 50\% reasoning token horizon: the minimum number of o3-mini reasoning tokens needed for tasks a model solves with 50\% success rate. We find that the no-CoT 50\% TH of frontier models has been doubling roughly every year over the past six years, with GPT-5.5's TH reaching over 3 minutes and reasoning token horizon exceeding 1,500 tokens. Our median estimates predict that frontier no-CoT THs could exceed 7 minutes by 2028, and 25 minutes by 2030, though these projections carry substantial uncertainty. We recommend frontier developers track this explicitly.
Subjects: Artificial Intelligence (cs.AI)
Cite as: arXiv:2606.07157 [cs.AI]
(or arXiv:2606.07157v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2606.07157
Focus to learn more
Submission history
From: Dewi Gould [view email]
[v1] Fri, 5 Jun 2026 11:17:08 UTC (4,603 KB)
Access Paper:
view license
Current browse context:
cs.AI
< prev | next >
new | recent | 2026-06
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)