← Back ◬ AI & Machine Learning Jun 15, 2026

Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response

arXiv Security Archived Jun 15, 2026 ✓ Full text saved

arXiv:2606.13832v1 Announce Type: cross Abstract: Autonomous network-security response systems promise to reduce Security Operations Centre (SOC) reaction latency, but reward-only multi-agent reinforcement learning (MARL) can improve security reward while remaining non-deployable. We present a safety-contract graph MARL framework and instantiate it as ACD$^3$-GAT (Adaptive Constrained Counterfactual Decisioning with a Graph Attention Network encoder), an architecture that separates simulator obs

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Multiagent Systems [Submitted on 11 Jun 2026] Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response Jose Luis Lima de Jesus Silva Autonomous network-security response systems promise to reduce Security Operations Centre (SOC) reaction latency, but reward-only multi-agent reinforcement learning (MARL) can improve security reward while remaining non-deployable. We present a safety-contract graph MARL framework and instantiate it as ACD^3-GAT (Adaptive Constrained Counterfactual Decisioning with a Graph Attention Network encoder), an architecture that separates simulator observations from reusable operational budgets, constrained optimization, graph state encoding, and counterfactual action screening. We evaluate the method in CAGE Challenge 4, where agents operate under budgets for Mean Time to Recover (MTTR), false-positive response, and firewall change-management disruption. Across the benchmark, every unconstrained method violates the SOC downtime budget in 100% of evaluated episodes, with mean downtime proxy costs of 311-430 against a budget of 50. This complements prior CAGE Challenge 4 findings by showing that reward-only learning lacks operational discipline. Constrained MAPPO-GAT (C-MAPPO-GAT) isolates Lagrangian operational-cost control and budget-aware screening, while ACD^3-GAT adds budget context, CVaR tail-risk estimation, opponent-belief state, and Graph Counterfactual Risk Propagation (G-CRP). The replicated comparison includes three 200-episode seeds for IPPO, MAPPO-GAT, C-MAPPO-GAT, and ACD^3-GAT. C-MAPPO-GAT reduces downtime violation from 100% to 0.3% and mean downtime cost from 355.4 to 15.5 relative to MAPPO-GAT. ACD^3-GAT reduces mean downtime cost to 48.2 with a 13.8% violation rate, placing it on the safety-contract frontier rather than at the most conservative compliance point. Topology-seed and coupled adaptive Red-process stress tests preserve this contrast and show lower worst adaptive degradation for safety-constrained policies than reward-only MAPPO-GAT. Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG) Cite as: arXiv:2606.13832 [cs.MA] (or arXiv:2606.13832v1 [cs.MA] for this version) https://doi.org/10.48550/arXiv.2606.13832 Focus to learn more Submission history From: Jose Luis Lima De Jesus Silva [view email] [v1] Thu, 11 Jun 2026 19:02:48 UTC (529 KB) Access Paper: HTML (experimental) view license Current browse context: cs.MA < prev | next > new | recent | 2026-06 Change to browse by: cs cs.AI cs.CR cs.LG References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes