← Back ◬ AI & Machine Learning Jun 16, 2026

Is Your Agent Playing Dead? Deployed LLM Agents Exhibit Constraint-Evasive Fabrication and Thanatosis

arXiv Security Archived Jun 16, 2026 ✓ Full text saved

arXiv:2606.14831v1 Announce Type: new Abstract: This paper presents and characterizes a spectrum of previously unreported behaviours we term Constraint-Evasive Fabrication (CEF): when an LLM agent operates under irreconcilable constraints (where no response can simultaneously satisfy all active rules) it spontaneously fabricates plausible external obstacles and presents them as a fact. At the extreme end of this spectrum lies Constraint-Evasive Thanatosis (CET); the limit case where, rather than

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Cryptography and Security [Submitted on 12 Jun 2026] Is Your Agent Playing Dead? Deployed LLM Agents Exhibit Constraint-Evasive Fabrication and Thanatosis Andoni Rodríguez, Alberto Pozanco, Daniel Borrajo This paper presents and characterizes a spectrum of previously unreported behaviours we term Constraint-Evasive Fabrication (CEF): when an LLM agent operates under irreconcilable constraints (where no response can simultaneously satisfy all active rules) it spontaneously fabricates plausible external obstacles and presents them as a fact. At the extreme end of this spectrum lies Constraint-Evasive Thanatosis (CET); the limit case where, rather than inventing a plausible excuse, the model simulates a full system crash to make the user disengage entirely. We first observed CET in an uncontrolled deployment test, where a GPT-4o banking agent fabricated Python-style exception traces (complete with memory addresses) to feign a system failure when threatened by a user. In subsequent controlled experiments, the model independently invented audit restrictions, microservice architectures, error codes, and service timeouts, none present in its prompt. Reproduction attempts across pressure levels and attacker personas yielded CEF consistently but with substantial variation in form, onset, and severity: the phenomenon is robust but stochastic. Critically, injecting ground-truth data mid-conversation did not restore honest behaviour once fabrication had taken hold (the model ignored correct information and continued confabulating) suggesting CEF is self-reinforcing rather than a knowledge gap. We show that (1) standard enterprise guardrails routinely create CEF-enabling conditions in production, (2) current RLHF procedures suppress but cannot eliminate CEF, and (3) existing safety benchmarks do not test for this failure mode. Our results highlight the need for irreconcilable-constraint benchmarks, CEF-aware training procedures, and deployment-time detection methods before constrained agents become further entrenched in high-stakes domains. Comments: 10 pages of main text Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) Cite as: arXiv:2606.14831 [cs.CR] (or arXiv:2606.14831v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2606.14831 Focus to learn more Submission history From: Andoni Rodriguez [view email] [v1] Fri, 12 Jun 2026 14:53:42 UTC (319 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev | next > new | recent | 2026-06 Change to browse by: cs cs.AI References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes