Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models
arXiv SecurityArchived Apr 08, 2026✓ Full text saved
arXiv:2604.05809v1 Announce Type: new Abstract: Multimodal pretrained models are vulnerable to backdoor attacks, yet most existing methods rely on visual or multimodal triggers, which are impractical since visually embedded triggers rarely occur in real-world data. To overcome this limitation, we propose a novel Text-Guided Backdoor (TGB) attack on multimodal pretrained models, where commonly occurring words in textual descriptions serve as backdoor triggers, significantly improving stealthiness
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 7 Apr 2026]
Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models
Yiyang Zhang, Chaojian Yu, Ziming Hong, Yuanjie Shao, Qinmu Peng, Tongliang Liu, Xinge You
Multimodal pretrained models are vulnerable to backdoor attacks, yet most existing methods rely on visual or multimodal triggers, which are impractical since visually embedded triggers rarely occur in real-world data. To overcome this limitation, we propose a novel Text-Guided Backdoor (TGB) attack on multimodal pretrained models, where commonly occurring words in textual descriptions serve as backdoor triggers, significantly improving stealthiness and practicality. Furthermore, we introduce visual adversarial perturbations on poisoned samples to modulate the model's learning of textual triggers, enabling a controllable and adjustable TGB attack. Extensive experiments on downstream tasks built upon multimodal pretrained models, including Composed Image Retrieval (CIR) and Visual Question Answering (VQA), demonstrate that TGB achieves practicality and stealthiness with adjustable attack success rates across diverse realistic settings, revealing critical security vulnerabilities in multimodal pretrained models.
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as: arXiv:2604.05809 [cs.CR]
(or arXiv:2604.05809v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2604.05809
Focus to learn more
Submission history
From: Chaojian Yu [view email]
[v1] Tue, 7 Apr 2026 12:46:23 UTC (2,587 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-04
Change to browse by:
cs
cs.LG
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)