arXiv SecurityArchived May 27, 2026✓ Full text saved
arXiv:2605.26791v1 Announce Type: new Abstract: YARA rules are widely shared across threat intelligence communities to enable collective defence against malware. This practice implicitly assumes that removing metadata (e.g., author fields) sufficiently protects the identity of contributing organisations. To assess the validity of this assumption, we systematically evaluate how much can be inferred from YARA rule text alone. Specifically, using a corpus of 23,305 rules from three major public rep
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 26 May 2026]
Anonymous YARA Rules Are Not Anonymous
Usman Rabiu Isah, Laurent Bobelin, Pascal Berthomé
YARA rules are widely shared across threat intelligence communities to enable collective defence against malware. This practice implicitly assumes that removing metadata (e.g., author fields) sufficiently protects the identity of contributing organisations. To assess the validity of this assumption, we systematically evaluate how much can be inferred from YARA rule text alone. Specifically, using a corpus of 23,305 rules from three major public repositories, we train independent classifiers along four stylometric fingerprint dimensions: individual author, source repository, malware family, and temporal drift, using three complementary methods: lexical n-grams (Burrows' Delta), syntactic AST features (Caliskan-Islam), and fine-tuned CodeBERT. Our results demonstrate that repository origin is almost perfectly recoverable (up to 99% accuracy), individual authors can be re-identified well above chance (76%), and malware family classification reaches 95%. Comparing the same repository attribution task across full-history and time-restricted subsets reveals a 9-18% accuracy gap, providing preliminary evidence of temporal drift in repository this http URL further disentangle content from style, we conduct per-malware family author attribution experiments. Even when the malware family is the same for all samples considered, authors can still be re-identified for five of seven tested families (mean accuracy 74.6%).
These findings constitute the first systematic demonstration that YARA rule sharing is a measurable OPSEC attack surface, and that metadata removal alone does not mitigate it.
Subjects: Cryptography and Security (cs.CR)
Cite as: arXiv:2605.26791 [cs.CR]
(or arXiv:2605.26791v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2605.26791
Focus to learn more
Submission history
From: Laurent Bobelin [view email]
[v1] Tue, 26 May 2026 10:01:50 UTC (548 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-05
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)