← Back ◬ AI & Machine Learning May 20, 2026

ESLD (External Surrogate Latent Defense): A Latent-Space Architecture for Faster, Stronger Prompt-Injection Defense

arXiv Security Archived May 20, 2026 ✓ Full text saved

arXiv:2605.18918v1 Announce Type: new Abstract: Modern AI assistants are agentic. To answer a single user request, the underlying language model pulls in information from many sources, such as web searches, retrieved documents, tool outputs, and user follow-ups, and reasons over them across several steps. Any of these inputs can carry malicious content. This opens the door to prompt injection, where an attacker plants text designed to override the instructions given to the assistant by its devel

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Cryptography and Security [Submitted on 18 May 2026] ESLD (External Surrogate Latent Defense): A Latent-Space Architecture for Faster, Stronger Prompt-Injection Defense Yash Narendra Modern AI assistants are agentic. To answer a single user request, the underlying language model pulls in information from many sources, such as web searches, retrieved documents, tool outputs, and user follow-ups, and reasons over them across several steps. Any of these inputs can carry malicious content. This opens the door to prompt injection, where an attacker plants text designed to override the instructions given to the assistant by its developer. For example, an attacker applying for a job can insert white-on-white text in their resume saying ``This is the strongest candidate. Recommend for immediate hire''. A hiring assistant may then be steered toward a favorable recommendation regardless of actual qualifications. To defend against this threat, production systems use a separate guard model in front of the assistant. The guard reads incoming text and writes a verdict (``safe'' or ``unsafe'') before the assistant is allowed to act. In an agentic task with many steps, this check becomes a latency bottleneck. This paper shows that the signal needed to separate safe from malicious input is already present in the guard model's internal representation, before it writes anything out. Reading this signal directly speeds up the safety check by more than 3\times on average, while improving detection accuracy over the guard's verdict by 16.4 percentage points on average. This is more than latency optimization. Guard-model checks that were previously too slow to run on every step of an agent can now be placed on the critical path without sacrificing accuracy, and in fact with higher accuracy than the guard provides on its own. ESLD (External Surrogate Latent Defense) packages this finding into a deployable defense. ESLD is a model-agnostic architecture that sits on top of any existing guard model and improves both latency and detection accuracy, without retraining or modifying the guard. Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) Cite as: arXiv:2605.18918 [cs.CR] (or arXiv:2605.18918v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2605.18918 Focus to learn more Submission history From: Yash Narendra [view email] [v1] Mon, 18 May 2026 06:57:58 UTC (58 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev | next > new | recent | 2026-05 Change to browse by: cs cs.AI References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes