← Back ◬ AI & Machine Learning May 21, 2026

Detecting Data Exfiltration through I2P Anonymity Networks: A Two-Phase Machine Learning Approach

arXiv Security Archived May 21, 2026 ✓ Full text saved

arXiv:2605.20546v1 Announce Type: new Abstract: The Invisible Internet Project (I2P) provides strong anonymity through garlic routing and distributed network architecture, making it attractive for legitimate privacy needs. Nevertheless, the same properties can be exploited by malicious actors to steal sensitive information from corporate networks without detection. Current network security measures often fail to detect I2P traffic, and existing literature has focused primarily on protocol-level

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Cryptography and Security [Submitted on 19 May 2026] Detecting Data Exfiltration through I2P Anonymity Networks: A Two-Phase Machine Learning Approach Siddique Abubakr Muntaka, Muntaka Mohammed, Mansuru Mikail Azindo, Ibrahim Tanko, Franco Osei-Wusu, Edward Danso Ansong, Benjamin Yankson, Oliver Kornyo, Foster Yeboah, Jones Yeboah, Richmond Adams, Pulcheria Serwaa The Invisible Internet Project (I2P) provides strong anonymity through garlic routing and distributed network architecture, making it attractive for legitimate privacy needs. Nevertheless, the same properties can be exploited by malicious actors to steal sensitive information from corporate networks without detection. Current network security measures often fail to detect I2P traffic, and existing literature has focused primarily on protocol-level traffic identification without addressing behavioral threat assessment. This paper proposes a two-stage machine-learning model for I2P traffic analysis using the SafeSurf Darknet 2025 dataset comprising 184,548 network flows. Phase 1 achieved 99.96% accuracy in distinguishing I2P traffic from normal network traffic using a Random Forest classifier, with only 2 false positives among 32,318 normal flows. Phase 2 performed behavioral analysis on traffic identified as I2P, classifying it as either exfiltration or legitimate activity, achieving 91.11% accuracy using XGBoost. The system demonstrates that tree-based ensemble methods substantially outperform deep neural networks and support vector machines for this task. Feature importance analysis indicates that the most discriminative features are packet timing and flow duration. These findings establish that accurate I2P traffic detection and threat prioritization are achievable in operational network environments, enabling security teams to focus resources on high-risk events rather than monitoring all encrypted traffic. Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI) Cite as: arXiv:2605.20546 [cs.CR] (or arXiv:2605.20546v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2605.20546 Focus to learn more Submission history From: Siddique Abubakr Muntaka [view email] [v1] Tue, 19 May 2026 22:46:22 UTC (1,813 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev | next > new | recent | 2026-05 Change to browse by: cs cs.NI References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes