MalwarePT: A Binary-Level Foundation Model for Malware Analysis
arXiv SecurityArchived May 19, 2026✓ Full text saved
arXiv:2605.16455v1 Announce Type: new Abstract: Automated malware analysis increasingly relies on machine learning, yet most existing methods remain task-specific and depend on handcrafted features or narrowly scoped models. Recent developments in binary-level foundation models suggest a path toward reusable program representations, but their application to malware analysis remains underexplored, and most still operate at byte-level tokenization, limiting their ability to capture multi-byte code
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 15 May 2026]
MalwarePT: A Binary-Level Foundation Model for Malware Analysis
Saastha Vasan, Yuzhou Nie, Kaie Chen, Yigitcan Kaya, Hojjat Aghakhani, Roman Vasilenko, Wenbo Guo, Christopher Kruegel, Giovanni Vigna
Automated malware analysis increasingly relies on machine learning, yet most existing methods remain task-specific and depend on handcrafted features or narrowly scoped models. Recent developments in binary-level foundation models suggest a path toward reusable program representations, but their application to malware analysis remains underexplored, and most still operate at byte-level tokenization, limiting their ability to capture multi-byte code patterns.
In this work, we introduce MalwarePT, a binary-level foundation model for malware analysis built on a ModernBERT-style encoder and pretrained with masked language modeling on Windows PE code-section bytes. We study whether a single pretrained encoder can transfer across malware-analysis tasks at different granularities, and how tokenization design affects that transfer. We train a byte-pair encoding (BPE) tokenizer on code-section bytes to compress frequent multi-byte patterns within a fixed context budget.
We evaluate MalwarePT on three downstream tasks spanning token-, function-, and document-level prediction: API call prediction, functionality classification, and malware (program) detection under temporal drift. Our evaluation demonstrates that pretraining yields substantial gains for API call prediction and functionality classification, and that increasing the BPE vocabulary beyond the byte-level baseline improves performance, with the strongest overall tradeoff at a vocabulary size of 1,024 tokens. In malware detection at FPR ~ 0.001, MalwarePT outperforms the neural network baselines, and is complementary to feature-engineering models that rely on PE structure. We also compare against existing binary foundation models and show that MalwarePT's design choices yield gains across all downstream tasks.
Comments: Preprint. Under review
Subjects: Cryptography and Security (cs.CR)
Cite as: arXiv:2605.16455 [cs.CR]
(or arXiv:2605.16455v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2605.16455
Focus to learn more
Submission history
From: Saastha Vasan [view email]
[v1] Fri, 15 May 2026 05:31:59 UTC (333 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-05
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)