AttnDiff: Attention-based Differential Fingerprinting for Large Language Models
arXiv SecurityArchived Apr 08, 2026✓ Full text saved
arXiv:2604.05502v1 Announce Type: new Abstract: Protecting the intellectual property of open-weight large language models (LLMs) requires verifying whether a suspect model is derived from a victim model despite common laundering operations such as fine-tuning (including PPO/DPO), pruning/compression, and model merging. We propose \textsc{AttnDiff}, a data-efficient white-box framework that extracts fingerprints from models via intrinsic information-routing behavior. \textsc{AttnDiff} probes mini
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 7 Apr 2026]
AttnDiff: Attention-based Differential Fingerprinting for Large Language Models
Haobo Zhang, Zhenhua Xu, Junxian Li, Shangfeng Sheng, Dezhang Kong, Meng Han
Protecting the intellectual property of open-weight large language models (LLMs) requires verifying whether a suspect model is derived from a victim model despite common laundering operations such as fine-tuning (including PPO/DPO), pruning/compression, and model merging. We propose \textsc{AttnDiff}, a data-efficient white-box framework that extracts fingerprints from models via intrinsic information-routing behavior. \textsc{AttnDiff} probes minimally edited prompt pairs that induce controlled semantic conflicts, captures differential attention patterns, summarizes them with compact spectral descriptors, and compares models using CKA. Across Llama-2/3 and Qwen2.5 (3B--14B) and additional open-source families, it yields high similarity for related derivatives while separating unrelated model families (e.g., >0.98 vs.\ <0.22 with M=60 probes). With 5--60 multi-domain probes, it supports practical provenance verification and accountability.
Comments: Accepted at ACL2026 Main
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as: arXiv:2604.05502 [cs.CR]
(or arXiv:2604.05502v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2604.05502
Focus to learn more
Submission history
From: Zhenhua Xu [view email]
[v1] Tue, 7 Apr 2026 06:57:47 UTC (5,259 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-04
Change to browse by:
cs
cs.LG
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)