CyberIntel ⬡ News
★ Saved ◆ Cyber Reads
← Back ◬ AI & Machine Learning Jun 09, 2026

Beyond Pass/Fail: Using Process Mining to Understand How LLMs Resist (and Fail) Red Team Attacks

arXiv Security Archived Jun 09, 2026 ✓ Full text saved

arXiv:2606.07833v1 Announce Type: new Abstract: Standard AI red teaming evaluations reduce adversarial campaigns to a single binary outcome, attack success rate (ASR), not taking into account the sequential structure of how models resist or yield to attacks. We propose applying process mining, a discipline for discovering and analyzing process models from event logs, to red teaming traces. We conduct a controlled experiment pitting 60 HarmBench prompts against two LLMs, GPT-OSS 120B and Llama 3.

Full text archived locally
✦ AI Summary · Claude Sonnet


    Computer Science > Cryptography and Security [Submitted on 5 Jun 2026] Beyond Pass/Fail: Using Process Mining to Understand How LLMs Resist (and Fail) Red Team Attacks Zvi Topol Standard AI red teaming evaluations reduce adversarial campaigns to a single binary outcome, attack success rate (ASR), not taking into account the sequential structure of how models resist or yield to attacks. We propose applying process mining, a discipline for discovering and analyzing process models from event logs, to red teaming traces. We conduct a controlled experiment pitting 60 HarmBench prompts against two LLMs, GPT-OSS 120B and Llama 3.3 70B, using 10 prompt mutation strategies over up to 110 attempts per prompt. From the resulting 8,575 scored events we extract Directly-Follows Graphs (DFGs) and state transition matrices that reveal structurally distinct defense profiles invisible to ASR alone: GPT-OSS exhibits a near-absorbing refusal state, while Llama presents multiple porous escape routes from refusal to getting successfully jailbroken. We further show that mutator effectiveness is asymmetric across models and that time-to-jailbreak distributions differ by an order of magnitude. Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) Cite as: arXiv:2606.07833 [cs.CR]   (or arXiv:2606.07833v1 [cs.CR] for this version)   https://doi.org/10.48550/arXiv.2606.07833 Focus to learn more Submission history From: Zvi Topol [view email] [v1] Fri, 5 Jun 2026 20:47:45 UTC (14 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev   |   next > new | recent | 2026-06 Change to browse by: cs cs.AI References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
    💬 Team Notes
    Article Info
    Source
    arXiv Security
    Category
    ◬ AI & Machine Learning
    Published
    Jun 09, 2026
    Archived
    Jun 09, 2026
    Full Text
    ✓ Saved locally
    Open Original ↗