Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection
arXiv SecurityArchived May 26, 2026✓ Full text saved
arXiv:2605.24294v1 Announce Type: new Abstract: Android malware detectors often degrade after deployment because of concept drift, while full retraining at each maintenance step is costly. We propose a chronological adaptive maintenance framework that models deployment-time maintenance as a sequential decision problem. The framework learns a stable latent representation through self-supervised learning during initialization, freezes the encoder, measures latent drift in the fixed representation
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 22 May 2026]
Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection
Ahmed Sabbah, Mohammad Kharma, Mohammad Alkhanafseh, Radi Jarrar, Samer Zein, David Mohaisen
Android malware detectors often degrade after deployment because of concept drift, while full retraining at each maintenance step is costly. We propose a chronological adaptive maintenance framework that models deployment-time maintenance as a sequential decision problem. The framework learns a stable latent representation through self-supervised learning during initialization, freezes the encoder, measures latent drift in the fixed representation space, and performs lightweight downstream adaptation using a trainable adapter and classification head. A proximal policy optimization controller selects low-cost maintenance actions based on the detector state, including current utility, retention on a fixed memory set, latent drift indicators, and update cost. We evaluate the framework under a causal deployment-style protocol on emulator and real Android malware datasets with static and dynamic features. Results show that the RL controller provides a strong cost-aware adaptation strategy, consistently remaining among the top-performing policies while achieving a favorable balance between temporal performance, memory retention, and maintenance cost under non-stationary deployment conditions.
Comments: 9 pages, 2 figures, 2 tables
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2605.24294 [cs.CR]
(or arXiv:2605.24294v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2605.24294
Focus to learn more
Submission history
From: Ahmed Sabbah [view email]
[v1] Fri, 22 May 2026 23:49:30 UTC (4,158 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-05
Change to browse by:
cs
cs.AI
cs.LG
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)