← Back ◬ AI & Machine Learning May 14, 2026

Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly

arXiv Security Archived May 14, 2026 ✓ Full text saved

arXiv:2605.12535v1 Announce Type: new Abstract: LM agents do not act on raw interaction history; they act on a bounded decision state assembled by truncation, summarization, reordering, and rewriting. If directive-bearing state is dropped, weakened, or rebound during that step, an agent can cross a policy boundary without prompt override, model changes, or persistent-memory compromise. We study this failure mode over local Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B using judged exact constraint r

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Cryptography and Security [Submitted on 2 May 2026] Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly Igor Santos-Grueiro LM agents do not act on raw interaction history; they act on a bounded decision state assembled by truncation, summarization, reordering, and rewriting. If directive-bearing state is dropped, weakened, or rebound during that step, an agent can cross a policy boundary without prompt override, model changes, or persistent-memory compromise. We study this failure mode over local Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B using judged exact constraint respect and direct audits of assembled-state visibility. We evaluate SafeContext, a control layer that pins control state, reuses retained control prefixes, and optionally injects reminders under pressure while keeping model weights fixed. Unmitigated risk is systematic, but absolute exact respect remains low. Against truncation, SafeContext yields small gains; against a strong structured-compaction policy, most aggregate lift disappears, leaving residual benefit mainly in overflow eviction and selected aliasing slices. Replay-only does not explain the effect. A larger-model extension on Qwen 14B and Llama 70B shows the same failure object under larger models, although sign and magnitude remain policy-conditional. Decision-time context assembly is therefore a measurable part of the control path that can be partially hardened. Subjects: Cryptography and Security (cs.CR) Cite as: arXiv:2605.12535 [cs.CR] (or arXiv:2605.12535v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2605.12535 Focus to learn more Submission history From: Igor Santos-Grueiro [view email] [v1] Sat, 2 May 2026 18:07:42 UTC (268 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev | next > new | recent | 2026-05 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes