AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective
arXiv SecurityArchived Mar 27, 2026✓ Full text saved
arXiv:2603.24857v1 Announce Type: new Abstract: As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a proliferation of attacks and defenses. However, existing studies largely treat these threats in isolation, lacking a coherent framework to expose their shared principles and interdependencies. This fragmented view hinders systematic understanding and limits the design of comprehensive defenses. Crucially, the two f
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 25 Mar 2026]
AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective
Zhenyi Wang, Siyu Luan
As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a proliferation of attacks and defenses. However, existing studies largely treat these threats in isolation, lacking a coherent framework to expose their shared principles and interdependencies. This fragmented view hinders systematic understanding and limits the design of comprehensive defenses. Crucially, the two foundational assets of ML -- \textbf{data} and \textbf{models} -- are no longer independent; vulnerabilities in one directly compromise the other. The absence of a holistic framework leaves open questions about how these bidirectional risks propagate across the ML pipeline. To address this critical gap, we propose a \emph{unified closed-loop threat taxonomy} that explicitly frames model-data interactions along four directional axes. Our framework offers a principled lens for analyzing and defending foundation models. The resulting four classes of security threats represent distinct but interrelated categories of attacks: (1) Data\rightarrowData (D\rightarrowD): including \emph{data decryption attacks and watermark removal attacks}; (2) Data\rightarrowModel (D\rightarrowM): including \emph{poisoning, harmful fine-tuning attacks, and jailbreak attacks}; (3) Model\rightarrowData (M\rightarrowD): including \emph{model inversion, membership inference attacks, and training data extraction attacks}; (4) Model\rightarrowModel (M\rightarrowM): including \emph{model extraction attacks}. Our unified framework elucidates the underlying connections among these security threats and establishes a foundation for developing scalable, transferable, and cross-modal security strategies, particularly within the landscape of foundation models.
Comments: Published at Transactions on Machine Learning Research (TMLR)
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as: arXiv:2603.24857 [cs.CR]
(or arXiv:2603.24857v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2603.24857
Focus to learn more
Submission history
From: Zhenyi Wang [view email]
[v1] Wed, 25 Mar 2026 22:53:43 UTC (116 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-03
Change to browse by:
cs
cs.AI
cs.CL
cs.CV
cs.LG
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)