When Entropy Is Not Enough: Multi-Modal Classification of Encrypted and Compressed Data Fragments
arXiv SecurityArchived Jun 01, 2026✓ Full text saved
arXiv:2605.31337v1 Announce Type: new Abstract: Reliable identification of encrypted data fragments is essential in cybersecurity, with applications to ransomware detection, digital forensics, and large-scale data analysis. Distinguishing encrypted from compressed fragments is particularly challenging, as short fragments lack structural data and exhibit low statistical redundancy. Traditional statistical methods based on byte-level distributions show limited effectiveness on this task. Recent ma
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 29 May 2026]
When Entropy Is Not Enough: Multi-Modal Classification of Encrypted and Compressed Data Fragments
Fabio De Gaspari, Dorjan Hitaj, Samuele Salaris, Luigi V. Mancini
Reliable identification of encrypted data fragments is essential in cybersecurity, with applications to ransomware detection, digital forensics, and large-scale data analysis. Distinguishing encrypted from compressed fragments is particularly challenging, as short fragments lack structural data and exhibit low statistical redundancy. Traditional statistical methods based on byte-level distributions show limited effectiveness on this task. Recent machine learning approaches improve performance by learning subtle patterns from raw bytes, but predominantly rely on single-modal representations, implicitly assuming that a single view of the data is sufficient for accurate classification. This paper shows that this assumption becomes a fundamental limitation in low-information settings, when only small fragments of data are available (512--2048 Bytes). We propose Triumvir, a multi-modal, uncertainty-aware ensemble architecture that integrates statistical, sequential, and spatial representations of raw byte fragments. Extensive experimental analysis demonstrates that Triumvir consistently outperforms state-of-the-art methods with gains of up to +4.5pp in binary and +6.4pp in multiclass classification. Ablation studies confirm that combining modalities is critical, yielding improvements of up to +5pp over partial configurations.
Comments: 20 pages
Subjects: Cryptography and Security (cs.CR)
Cite as: arXiv:2605.31337 [cs.CR]
(or arXiv:2605.31337v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2605.31337
Focus to learn more
Submission history
From: Dorjan Hitaj [view email]
[v1] Fri, 29 May 2026 14:18:13 UTC (335 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-05
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)