Enhancing Jailbreak Attacks on LLMs via Persona Prompts
arXiv SecurityArchived Mar 26, 2026✓ Full text saved
arXiv:2507.22171v3 Announce Type: replace Abstract: Jailbreak attacks aim to exploit large language models (LLMs) by inducing them to generate harmful content, thereby revealing their vulnerabilities. Understanding and addressing these attacks is crucial for advancing the field of LLM safety. Previous jailbreak approaches have mainly focused on direct manipulations of harmful intent, with limited attention to the impact of persona prompts. In this study, we systematically explore the efficacy of
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 28 Jul 2025 (v1), last revised 25 Mar 2026 (this version, v3)]
Enhancing Jailbreak Attacks on LLMs via Persona Prompts
Zheng Zhang, Peilin Zhao, Deheng Ye, Hao Wang
Jailbreak attacks aim to exploit large language models (LLMs) by inducing them to generate harmful content, thereby revealing their vulnerabilities. Understanding and addressing these attacks is crucial for advancing the field of LLM safety. Previous jailbreak approaches have mainly focused on direct manipulations of harmful intent, with limited attention to the impact of persona prompts. In this study, we systematically explore the efficacy of persona prompts in compromising LLM defenses. We propose a genetic algorithm-based method that automatically crafts persona prompts to bypass LLM's safety mechanisms. Our experiments reveal that: (1) our evolved persona prompts reduce refusal rates by 50-70% across multiple LLMs, and (2) these prompts demonstrate synergistic effects when combined with existing attack methods, increasing success rates by 10-20%. Our code and data are available at this https URL.
Comments: Workshop on LLM Persona Modeling at NeurIPS 2025
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as: arXiv:2507.22171 [cs.CR]
(or arXiv:2507.22171v3 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2507.22171
Focus to learn more
Submission history
From: Zheng Zhang [view email]
[v1] Mon, 28 Jul 2025 12:03:22 UTC (255 KB)
[v2] Sun, 30 Nov 2025 18:50:44 UTC (250 KB)
[v3] Wed, 25 Mar 2026 15:46:17 UTC (252 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2025-07
Change to browse by:
cs
cs.AI
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)