← Back ◬ AI & Machine Learning May 29, 2026

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

arXiv AI Archived May 29, 2026 ✓ Full text saved

arXiv:2605.29055v1 Announce Type: new Abstract: Hallucination remains a major reliability barrier for production LLM systems, particularly in multi-agent pipelines where unsupported claims can propagate unchecked across stages. This paper adapts a HOPE-inspired Nested Learning architecture with Continuum Memory Systems (CMS) and semantic similarity caching to a hybrid benchmark of 310 prompts combining 217 epistemic-uncertainty prompts and 93 fabrication-induction stress-test prompts. A three-st

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Artificial Intelligence [Submitted on 27 May 2026] Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching Diego Gosmar, Deborah A. Dahl Hallucination remains a major reliability barrier for production LLM systems, particularly in multi-agent pipelines where unsupported claims can propagate unchecked across stages. This paper adapts a HOPE-inspired Nested Learning architecture with Continuum Memory Systems (CMS) and semantic similarity caching to a hybrid benchmark of 310 prompts combining 217 epistemic-uncertainty prompts and 93 fabrication-induction stress-test prompts. A three-stage agentic pipeline orchestrated via the Open Floor Protocol (OFP) is evaluated with five KPIs -- FCD (Factual Claim Density), FGR (Factual Grounding References), FDF (Fictional Disclaimer Frequency), ECS (Explicit Contextualization Score), and OSR (Observability Score Ratio) -- aggregated into THS (Total Hallucination Score) across five weighting configurations to study mitigation-observability trade-offs. FDF, ECS, OSR, and FGR are subtracted as mitigation signals, so that a more negative THS indicates stronger mitigation. The FrontEndAgent is configured as a high-stochasticity generator (temperature = 1.0) to produce a realistic hallucination baseline, while the SecondLevelReviewer and ThirdLevelReviewer operate as progressive correctors. This asymmetric design yields end-to-end THS reductions of -31.3% to -35.9% across five weighting configurations. Semantic caching achieves 440 cache hits over 930 potential calls (47.3% hit rate), reducing LLM invocations to 490, lowering energy and CO2e footprint, and making multi-stage review pipelines operationally viable at production scale. ExtremeObservability attains the most negative final THS (-0.0709), confirming that observability-heavy configurations reinforce rather than compromise mitigation. These findings suggest that memory-augmented multi-agent designs can jointly improve factual reliability, operational efficiency, and auditability without model retraining. Comments: 21 pages, 14 figures Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA) Cite as: arXiv:2605.29055 [cs.AI] (or arXiv:2605.29055v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2605.29055 Focus to learn more Submission history From: Diego Gosmar [view email] [v1] Wed, 27 May 2026 20:01:08 UTC (6,839 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev | next > new | recent | 2026-05 Change to browse by: cs cs.MA References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes