← Back ◬ AI & Machine Learning Apr 03, 2026

SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection

arXiv Security Archived Apr 03, 2026 ✓ Full text saved

arXiv:2510.16219v3 Announce Type: replace Abstract: Malicious agents pose significant threats to the reliability and decision-making capabilities of Multi-Agent Systems (MAS) powered by Large Language Models (LLMs). Existing defenses often fall short due to reactive designs or centralized architectures which may introduce single points of failure. To address these challenges, we propose SentinelNet, the first decentralized framework for proactively detecting and mitigating malicious behaviors in

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Cryptography and Security [Submitted on 17 Oct 2025 (v1), last revised 2 Apr 2026 (this version, v3)] SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection Yang Feng, Xudong Pan Malicious agents pose significant threats to the reliability and decision-making capabilities of Multi-Agent Systems (MAS) powered by Large Language Models (LLMs). Existing defenses often fall short due to reactive designs or centralized architectures which may introduce single points of failure. To address these challenges, we propose SentinelNet, the first decentralized framework for proactively detecting and mitigating malicious behaviors in multi-agent collaboration. SentinelNet equips each agent with a credit-based detector trained via contrastive learning on augmented adversarial debate trajectories, enabling autonomous evaluation of message credibility and dynamic neighbor ranking via bottom-k elimination to suppress malicious communications. To overcome the scarcity of attack data, it generates adversarial trajectories simulating diverse threats, ensuring robust training. Experiments on MAS benchmarks show SentinelNet achieves near-perfect detection of malicious agents, close to 100% within two debate rounds, and recovers 95% of system accuracy from compromised baselines. By exhibiting strong generalizability across domains and attack patterns, SentinelNet establishes a novel paradigm for safeguarding collaborative MAS. Comments: Accepted at The ACM Web Conference 2026 (WWW 2026) Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) Cite as: arXiv:2510.16219 [cs.CR] (or arXiv:2510.16219v3 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2510.16219 Focus to learn more Submission history From: Yang Feng [view email] [v1] Fri, 17 Oct 2025 21:10:35 UTC (978 KB) [v2] Tue, 21 Oct 2025 12:58:13 UTC (978 KB) [v3] Thu, 2 Apr 2026 10:14:48 UTC (973 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev | next > new | recent | 2025-10 Change to browse by: cs cs.AI References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes