Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles
arXiv AIArchived May 28, 2026✓ Full text saved
arXiv:2605.27784v1 Announce Type: new Abstract: LLM agents are governed by long-lived natural-language prompt policies, but individually reasonable standing rules can interact in uninspected ways. We study live intra-policy rule-conflict diagnosis: finding rule pairs inside a single prompt policy that can co-govern a realistic state, and measuring how models resolve that pressure in responses or tool actions. We introduce WIRE, a Witnessed Intra-policy Rule Evaluation pipeline. WIRE extracts sou
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Artificial Intelligence
[Submitted on 27 May 2026]
Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles
Lu Yan, Xuan Chen, Xiangyu Zhang
LLM agents are governed by long-lived natural-language prompt policies, but individually reasonable standing rules can interact in uninspected ways. We study live intra-policy rule-conflict diagnosis: finding rule pairs inside a single prompt policy that can co-govern a realistic state, and measuring how models resolve that pressure in responses or tool actions. We introduce WIRE, a Witnessed Intra-policy Rule Evaluation pipeline. WIRE extracts source-grounded rules, encodes them as PyRule clauses, uses satisfiability checks to retain same-surface hard-collision candidates, realizes those candidates as concrete co-governance witnesses, and judges model outputs against the original source-rule text. Across six public prompt policies, WIRE extracts 276 source rules and 560 atomic clauses, classifies 30,944 within-policy clause-pair comparisons, retains 170 encoded hard-collision candidate source-rule pairs, and realizes them as 1,402 concrete witnesses. In policy-only evaluation, these witnesses yield 13,335 post- generation trials where both source rules govern and both compliance labels are judgeable. Only 35.4% fall in joint compliance; 64.6% violate at least one governed source rule. These profiles are conditional diagnostics for WIRE-selected candidates, not deployment-frequency or causal excess failure estimates, but they reveal distinct policy, model, and tool-action resolution patterns.
Subjects: Artificial Intelligence (cs.AI)
Cite as: arXiv:2605.27784 [cs.AI]
(or arXiv:2605.27784v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2605.27784
Focus to learn more
Submission history
From: Lu Yan [view email]
[v1] Wed, 27 May 2026 00:09:45 UTC (185 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.AI
< prev | next >
new | recent | 2026-05
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)