← Back ◬ AI & Machine Learning Jun 10, 2026

From Data Heterogeneity to Convergence: A Data-Centric Review of Federated Learning

arXiv Security Archived Jun 10, 2026 ✓ Full text saved

arXiv:2606.10595v1 Announce Type: new Abstract: Federated Learning (FL) has emerged as a promising solution for data hunger in centralized learning. This paradigm enables privacy with multiple clients to train a shared-task model collaboratively without exposing their local data. While being a key component in any learning system, data is also a primary source of vulnerabilities and challenges, and a major determinant of a stable and well-converged training. Existing FL reviews describe general

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Cryptography and Security [Submitted on 9 Jun 2026] From Data Heterogeneity to Convergence: A Data-Centric Review of Federated Learning Huong Nguyen, Mickaël Bettinelli, Amirhossein Ghaffari, Alexandre Benoit, Hong-Tri Nguyen, Susanna Pirttikangas, Lauri Lovén Federated Learning (FL) has emerged as a promising solution for data hunger in centralized learning. This paradigm enables privacy with multiple clients to train a shared-task model collaboratively without exposing their local data. While being a key component in any learning system, data is also a primary source of vulnerabilities and challenges, and a major determinant of a stable and well-converged training. Existing FL reviews describe general foundations, security practices, opportunities, challenges, and applications, without delving into diverse aspects of data and considering problems from the data perspective. They rarely provide a data-lens synthesis that links concrete data properties, split protocols, and defenses to convergence speed and stability. This survey fills that gap with three advances. First, we analyze non-IID into measurable traits and rank their influence on convergence as strong, medium, or light, explaining the mechanisms behind each and reconciling evidence across images, texts, and graphs. Second, we connect experimental splitting practices to the real phenomena they emulate, expose the artifacts they introduce, and show how those artifacts affect target accuracy. Third, we analyze how data-related vulnerabilities and their proposed defenses affect convergence, reporting performance under clean and adversarial conditions to make the convergence-robustness trade-off explicit. To our knowledge, this is the first survey to provide a complete understanding of data-related challenges that govern FL. With clear takeaways distilled for each concern, our work serves as actionable guidance, helping practitioners design their system with predictable convergence and stability. Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) Cite as: arXiv:2606.10595 [cs.CR] (or arXiv:2606.10595v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2606.10595 Focus to learn more Submission history From: Huong Nguyen [view email] [v1] Tue, 9 Jun 2026 09:00:04 UTC (1,641 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev | next > new | recent | 2026-06 Change to browse by: cs cs.AI References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes