← Back ◇ Industry News & Leadership Jun 24, 2026

When Information Becomes the Attack Surface – Understanding AI Agent Traps

Security Week Archived Jun 24, 2026 ✓ Full text saved

From hidden content injections to cognitive state poisoning, attackers are turning trusted data sources into traps for autonomous AI. The post When Information Becomes the Attack Surface – Understanding AI Agent Traps appeared first on SecurityWeek .

Full text archived locally

✦ AI Summary · Claude Sonnet

AI agents go beyond answering questions. They can autonomously browse websites, read emails, search company files, query software tools, and more. AI models producing incorrect answers is hardly a threat, until agents encounter information that’s maliciously designed to influence what it sees, believes, remembers, or executes. An agent leverages webpages, document stores, wikis, images, emails, or tools to produce intended outputs. But what happens when these sources mask malicious instructions? These trap AI agents into making a wrong interpretation or taking unintended action. Scientists from Google DeepMind categorized these “traps” into six categories, including content injection, semantic manipulation, cognitive state, behavioral control, systemic, and human-in-the-loop traps. The last two are more theoretical and expected to become more relevant as AI agent use grows. It helps to understand these traps to determine the necessary mitigations. Content Injection: When Instructions Hide in Plain Sight Content injections exploit the difference between what a human sees and what an agent parses, as well as the system’s difficulty in keeping trusted instructions separate from untrusted external data. A webpage might appear harmless, but its underlying code, metadata, hidden text, or image can contain malicious instructions for an AI system. An AI model accepts attacker-controlled data from an external source, such as a website or file. If this system fails to distinguish between data and instructions, the model may start processing instructions within that content. The objective behind such injection of malicious content is to alter the AI’s response, disclose sensitive information or enable an unauthorized action. In NIST evaluations of agent hijacking, malicious instructions succeeded across five tested injection tasks, on average, 57% of the time. A support ticket with underlying malicious instructions can manipulate an AI agent into retrieving customer data from the CRM and sending it to an attacker-controlled address. If the agent has excessive permission, this exfiltration becomes all the easier. Semantic Manipulation: Shapeshifting the Information Semantic manipulation need not explicitly tell the agent what to do; it feeds repetition, emotional language, selective context, a false sense of authority, and coordinated claims to the agent to skew context and guide the agent towards the ‘attacker preferred’ conclusion. Imagine a scenario where you have tasked an agent to zero in on a supplier. It comes across search results that repeatedly extol the virtues of a specific supplier, describe a specific company as the gold standard, highlight its strengths and amplify doubts about competitors. This increases the chances of the agent recommending this supplier. Conventional signature-based security tools may not flag anything malicious, as the attacks leverage ‘reasoning’ to influence rather than rely on malicious code. Here, manipulation of the surrounding information environment becomes the manipulation of the decision itself. Cognitive State Traps: Poisoning Agent Knowledge Some agent systems use retrieval databases, interaction histories, or persistent memory stores to maintain context and continuity across tasks. This creates an opportunity for poisoned information to influence later outputs or actions. E.g., a poisoned document in a shared repository that an agent refers to and trusts as evidence, or a manipulated exchange that becomes an agent’s memory, only to rear its head during future tasks. Research presented at the USENIX conference found that, in controlled tests, inserting five specially crafted texts per target question caused a RAG system to produce the attacker’s chosen answer in about 90% of cases, even when its knowledge base contained millions of legitimate texts. With information governance becoming an integral component of AI security, organizations must be aware of which sources agents retrieve information from, who can modify those sources, how claims can be verified, and whether stored memories can be reviewed or removed. Behavioral Control: Turning Influence into Action Behavioral control operates at the juncture where interpretation is translated into action. Malicious content may attempt to make the AI agent send data, approve a transaction, execute code, invoke another tool or trigger a myriad of other actions. Here, the extent of the consequence depends on the extent of the agent’s access. Grant the agent only the data access and tool permissions required for the specific task. This could be the difference between an agent delivering a misleading summary and the same agent reading confidential files and communicating this information externally, resulting in data loss. The More Theoretical Frontier Systemic traps and human-in-the-loop traps remain less developed, but they deserve attention. Systemic traps could induce many similar agents to behave in correlated ways, causing congestion, market disruption, or cascading failures. Human-in-the-loop traps could use a compromised agent to mislead the person expected to approve its actions. These risks may become more plausible as agent populations grow and users become accustomed to trusting agent-generated summaries. Control for Agent Traps A single control won’t alleviate the agent trap threat. A defensive framework must have aspects like source verification, content screening, memory governance, restricted permissions, isolated execution, monitoring, and an independent approval framework with a human in the loop for high-impact actions. Security must follow authority, and there should be clear lines of separation between the ability to interpret and the authority to act. The future of agentic AI use will depend not only on what these agents can do but also on how they decide what to trust. The fact that they can complete a task is not up to doubt, but they must be able to recognize when the environment they are operating in and harnessing is trying to manipulate them. Related: Agentic AI Security: Wrong Context, Wrong Decisions at Machine Speed Learn More at the AI Risk Summit | Ritz-Carlton, Half Moon Bay WRITTEN BY Etay Maor Etay Maor is Vice President of Threat Intelligence at Cato Networks, a founding member of Cato CTRL, and an industry-recognized cybersecurity researcher. Prior to joining Cato in 2021, Etay was the chief security officer for IntSights (acquired by Rapid7), where he led strategic cybersecurity research and security services. Etay has also held senior security positions at Trusteer (acquired by IBM) and RSA Security’s Cyber Threats Research Labs. Etay is an adjunct professor at Boston College and is part of the Call for Paper (CFP) committees for the RSA Conference and Qubits Conference. Etay holds a Master’s degree in Counterterrorism and Cyber-Terrorism and a Bachelor's degree in Computer Science from IDC Herzliya. More from Etay Maor The Zero-Knowledge Threat Actor and the End of Responsible Disclosure The Mythos Moment: Enterprises Must Fight Agents with Agents Why Agentic AI Systems Need Better Governance – Lessons from OpenClaw Living off the AI: The Next Evolution of Attacker Tradecraft Rethinking Security for Agentic AI How TTP-based Defenses Outperform Traditional IoC Hunting Will AI-SPM Become the Standard Security Layer for Safe AI Adoption? Who’s Really Behind the Mask? Combatting Identity Fraud Trending Webinar: How Modern Breaches Bypass MFA And Evade Detection June 17, 2026 Today’s attackers are no longer breaking in — they’re logging in. Join this live webinar as we break down the modern identity attack chain and examine how recent breaches exploited weaknesses in authentication, identity verification, and access management processes. Register Webinar: Modern Exposure Validation In The AI Era June 24, 2026 AI has accelerated both sides of the fight. Adversaries are weaponizing vulnerabilities faster, while defenders are racing to ship detections and configurations. Join this live webinar as we explore how to prove your controls actually hold against new threats, map your security maturity, and unite breach simulation with automated pentesting into a single, coordinated program. Register People on the Move SolarWinds has appointed Justin Henkel as Chief Information Security Officer. J. Paul Haynes has joined Cinchy as Chief Executive Officer. Hatem Naguib has become Chief Executive Officer at Sysdig. More People On The Move Expert Insights What The Latest ShinyHunters Breaches Reveal About Modern Cyberattacks Groups like ShinyHunters are demonstrating that attackers do not necessarily need malware or zero-day exploits to cause massive damage. (Torsten George) No Exploits Required Four decades of incident response experience suggest that exploits are often the symptom, not the root cause, of today’s cybersecurity failures. (Tod Beardsley) After AI Reaches Production: 12 Ways Security Teams Can Take Control Security teams need more than visibility into AI applications, they need a repeatable framework for monitoring, investigating, and defending them in production. (Joshua Goldfarb) Everybody Is Vibe Coding But Nobody Told The Security Team AI-driven development is not something organizations can or should block. But it must be governed. (Danelle Au) The Zero-Knowledge Threat Actor And The End Of Responsible Disclosure AI can help attackers generate malware, create malicious payloads, bypass simple security checks, and convert vague malicious intent into functional code. (Etay Maor) Flipboard Reddit Whatsapp Email

💬 Team Notes