Robustness of Vision Foundation Models to Common Perturbations
arXiv SecurityArchived Apr 17, 2026✓ Full text saved
arXiv:2604.14973v1 Announce Type: new Abstract: A vision foundation model outputs an embedding vector for an image, which can be affected by common editing operations (e.g., JPEG compression, brightness, contrast adjustments). These common perturbations alter embedding vectors and may impact the performance of downstream tasks using these embeddings. In this work, we present the first systematic study on foundation models' robustness to such perturbations. We propose three robustness metrics and
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 16 Apr 2026]
Robustness of Vision Foundation Models to Common Perturbations
Hongbin Liu, Zhengyuan Jiang, Cheng Hong, Neil Zhenqiang Gong
A vision foundation model outputs an embedding vector for an image, which can be affected by common editing operations (e.g., JPEG compression, brightness, contrast adjustments). These common perturbations alter embedding vectors and may impact the performance of downstream tasks using these embeddings. In this work, we present the first systematic study on foundation models' robustness to such perturbations. We propose three robustness metrics and formulate five desired mathematical properties for these metrics, analyzing which properties they satisfy or violate. Using these metrics, we evaluate six industry-scale foundation models (OpenAI, Meta) across nine common perturbation categories, finding them generally non-robust. We also show that common perturbations degrade downstream application performance (e.g., classification accuracy) and that robustness values can predict performance impacts. Finally, we propose a fine-tuning approach to improve robustness without sacrificing utility.
Comments: Accepted by CVPR 2026 Workshop
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2604.14973 [cs.CR]
(or arXiv:2604.14973v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2604.14973
Focus to learn more
Submission history
From: Zhengyuan Jiang [view email]
[v1] Thu, 16 Apr 2026 13:07:25 UTC (325 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-04
Change to browse by:
cs
cs.CV
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)