← Back ◬ AI & Machine Learning Mar 26, 2026

Enhancing Jailbreak Attacks on LLMs via Persona Prompts

arXiv Security Archived Mar 26, 2026 ✓ Full text saved

arXiv:2507.22171v3 Announce Type: replace Abstract: Jailbreak attacks aim to exploit large language models (LLMs) by inducing them to generate harmful content, thereby revealing their vulnerabilities. Understanding and addressing these attacks is crucial for advancing the field of LLM safety. Previous jailbreak approaches have mainly focused on direct manipulations of harmful intent, with limited attention to the impact of persona prompts. In this study, we systematically explore the efficacy of

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Cryptography and Security [Submitted on 28 Jul 2025 (v1), last revised 25 Mar 2026 (this version, v3)] Enhancing Jailbreak Attacks on LLMs via Persona Prompts Zheng Zhang, Peilin Zhao, Deheng Ye, Hao Wang Jailbreak attacks aim to exploit large language models (LLMs) by inducing them to generate harmful content, thereby revealing their vulnerabilities. Understanding and addressing these attacks is crucial for advancing the field of LLM safety. Previous jailbreak approaches have mainly focused on direct manipulations of harmful intent, with limited attention to the impact of persona prompts. In this study, we systematically explore the efficacy of persona prompts in compromising LLM defenses. We propose a genetic algorithm-based method that automatically crafts persona prompts to bypass LLM's safety mechanisms. Our experiments reveal that: (1) our evolved persona prompts reduce refusal rates by 50-70% across multiple LLMs, and (2) these prompts demonstrate synergistic effects when combined with existing attack methods, increasing success rates by 10-20%. Our code and data are available at this https URL. Comments: Workshop on LLM Persona Modeling at NeurIPS 2025 Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) Cite as: arXiv:2507.22171 [cs.CR] (or arXiv:2507.22171v3 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2507.22171 Focus to learn more Submission history From: Zheng Zhang [view email] [v1] Mon, 28 Jul 2025 12:03:22 UTC (255 KB) [v2] Sun, 30 Nov 2025 18:50:44 UTC (250 KB) [v3] Wed, 25 Mar 2026 15:46:17 UTC (252 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev | next > new | recent | 2025-07 Change to browse by: cs cs.AI References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes