← Back ◬ AI & Machine Learning Mar 26, 2026

Evidence for Limited Metacognition in LLMs

arXiv AI Archived Mar 26, 2026 ✓ Full text saved

arXiv:2509.21545v2 Announce Type: cross Abstract: The possibility of LLM self-awareness and even sentience is gaining increasing public attention and has major safety and policy implications, but the science of measuring them is still in a nascent state. Here we introduce a novel methodology for quantitatively evaluating metacognitive abilities in LLMs. Taking inspiration from research on metacognition in nonhuman animals, our approach eschews model self-reports and instead tests to what degree

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Machine Learning [Submitted on 25 Sep 2025 (v1), last revised 31 Jan 2026 (this version, v2)] Evidence for Limited Metacognition in LLMs Christopher Ackerman The possibility of LLM self-awareness and even sentience is gaining increasing public attention and has major safety and policy implications, but the science of measuring them is still in a nascent state. Here we introduce a novel methodology for quantitatively evaluating metacognitive abilities in LLMs. Taking inspiration from research on metacognition in nonhuman animals, our approach eschews model self-reports and instead tests to what degree models can strategically deploy knowledge of internal states. Using two experimental paradigms, we demonstrate that frontier LLMs introduced since early 2024 show increasingly strong evidence of certain metacognitive abilities, specifically the ability to assess and utilize their own confidence in their ability to answer factual and reasoning questions correctly and the ability to anticipate what answers they would give and utilize that information appropriately. We buttress these behavioral findings with an analysis of the token probabilities returned by the models, which suggests the presence of an upstream internal signal that could provide the basis for metacognition. We further find that these abilities 1) are limited in resolution, 2) emerge in context-dependent manners, and 3) seem to be qualitatively different from those of humans. We also report intriguing differences across models of similar capabilities, suggesting that LLM post-training may have a role in developing metacognitive abilities. Comments: 26 pages, 25 figures Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2509.21545 [cs.LG] (or arXiv:2509.21545v2 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2509.21545 Focus to learn more Submission history From: Christopher Ackerman [view email] [v1] Thu, 25 Sep 2025 20:30:15 UTC (3,135 KB) [v2] Sat, 31 Jan 2026 16:12:10 UTC (3,383 KB) Access Paper: HTML (experimental) view license Current browse context: cs.LG < prev | next > new | recent | 2025-09 Change to browse by: cs cs.AI References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes