The {\alpha}-Law of Observable Belief Revision in Large Language Model Inference
arXiv AIArchived Mar 23, 2026✓ Full text saved
arXiv:2603.19262v1 Announce Type: cross Abstract: Large language models (LLMs) that iteratively revise their outputs through mechanisms such as chain-of-thought reasoning, self-reflection, or multi-agent debate lack principled guarantees regarding the stability of their probability updates. We identify a consistent multiplicative scaling law that governs how instruction-tuned LLMs revise probability assignments over candidate answers, expressed as a belief revision exponent that controls how pri
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Computation and Language
[Submitted on 26 Feb 2026]
The α-Law of Observable Belief Revision in Large Language Model Inference
Mike Farmer, Abhinav Kochar, Yugyung Lee
Large language models (LLMs) that iteratively revise their outputs through mechanisms such as chain-of-thought reasoning, self-reflection, or multi-agent debate lack principled guarantees regarding the stability of their probability updates. We identify a consistent multiplicative scaling law that governs how instruction-tuned LLMs revise probability assignments over candidate answers, expressed as a belief revision exponent that controls how prior beliefs and verification evidence are combined during updates. We show theoretically that values of the exponent below one are necessary and sufficient for asymptotic stability under repeated revision. Empirical evaluation across 4,975 problems spanning graduate-level benchmarks (GPQA Diamond, TheoremQA, MMLU-Pro, and ARC-Challenge) and multiple model families (GPT-5.2 and Claude Sonnet 4) reveals near-Bayesian update behavior, with models operating slightly above the stability boundary in single-step revisions. However, multi-step experiments demonstrate that the exponent decreases over successive revisions, producing contractive long-run dynamics consistent with theoretical stability predictions. Token-level validation using Llama-3.3-70B further confirms similar behavior across both log-probability measurements and self-reported confidence elicitation. Analysis of update components exposes architecture-specific trust-ratio patterns, with GPT-5.2 showing balanced weighting between prior and evidence, while Claude modestly favors new evidence. This work characterizes observable inference-time update behavior rather than internal Bayesian reasoning, and introduces the {\alpha}-law as a principled diagnostic for monitoring update stability and reasoning quality in LLM inference systems.
Comments: 24 pages, 13 figures, 10 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
MSC classes: 68T50, 68Q32, 60B10, 62C10
ACM classes: I.2.7; F.2.2; I.2.6; I.2.4
Cite as: arXiv:2603.19262 [cs.CL]
(or arXiv:2603.19262v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.19262
Focus to learn more
Submission history
From: Yugyung Lee [view email]
[v1] Thu, 26 Feb 2026 20:52:27 UTC (1,450 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CL
< prev | next >
new | recent | 2026-03
Change to browse by:
cs
cs.AI
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)