What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents
arXiv SecurityArchived Jun 03, 2026✓ Full text saved
arXiv:2606.02668v1 Announce Type: new Abstract: Coding agents gate consequential actions behind a human-in-the-loop approval dialog, but the dialog is narrated by the agent itself: the human approves a summary the agent writes. The Lies-in-the-Loop (LITL) attack shows that summary is forgeable, so a compromised agent can show a benign description while a different action runs. This paper names the missing property, Consent Integrity, by importing What You See Is What You Sign (WYSIWYS) and the t
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 1 Jun 2026]
What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents
Xiaoqi Weng
Coding agents gate consequential actions behind a human-in-the-loop approval dialog, but the dialog is narrated by the agent itself: the human approves a summary the agent writes. The Lies-in-the-Loop (LITL) attack shows that summary is forgeable, so a compromised agent can show a benign description while a different action runs. This paper names the missing property, Consent Integrity, by importing What You See Is What You Sign (WYSIWYS) and the trusted-path property into the agent approval channel: the action shown to the human must be rendered by a trusted mediator from the real action at the boundary, not the agent's narration, over a path the agent cannot spoof, and bound to the exact action that executes. Two twists distinguish it from classical WYSIWYS: the renderer is the adversary, and the boundary ground truth is a low-level event that must be decoded without trusting the agent. Since no decoder is complete, the realizable target is analyzer-relative: whatever the analyzer cannot classify is surfaced as uninspectable rather than silently approved. A prototype implements the analyzer, renderer, and bind-to-execution; total mediation and the trusted path are specified but assumed, not implemented. On GTFOBins, an independent corpus of 1330 trusted-tool abuses, the prototype silently passes 10.0% (every instance through a trusted tool); on tldr, 28,798 normal-usage commands, it marks 87.0% uninspectable. These two independent measurements bracket the design's central tension: the trust list that bounds silent passes is the same one that drives over-prompting, and a boundary-only mediator can move along that frontier but not escape it. The contribution is the property, the mechanism, and an honest position on that frontier, not a solved defense.
Comments: Preprint. IEEE conference format. Proof-of-concept; artifact at this https URL
Subjects: Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC)
ACM classes: D.4.6; K.6.5; H.5.2; I.2.11
Cite as: arXiv:2606.02668 [cs.CR]
(or arXiv:2606.02668v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2606.02668
Focus to learn more
Submission history
From: Xiaoqi Weng [view email]
[v1] Mon, 1 Jun 2026 11:08:17 UTC (18 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-06
Change to browse by:
cs
cs.HC
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)