← Back ◬ AI & Machine Learning Apr 27, 2026

Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models

arXiv Security Archived Apr 27, 2026 ✓ Full text saved

arXiv:2512.05707v2 Announce Type: replace Abstract: We evaluate the effectiveness of filtering child images from training datasets of text-to-image models to prevent model misuse to create child sexual abuse material (CSAM). First, we capture the complexity of preventing CSAM generation using a game-based security definition. Second, we show that current detection methods cannot remove all children from a dataset. Third, using an ethical proxy for CSAM (a child wearing glasses), we show that eve

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Cryptography and Security [Submitted on 5 Dec 2025 (v1), last revised 23 Apr 2026 (this version, v2)] Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models Ana-Maria Cretu, Klim Kireev, Amro Abdalla, Wisdom Obinna, Raphael Meier, Sarah Adel Bargal, Elissa M. Redmiles, Carmela Troncoso We evaluate the effectiveness of filtering child images from training datasets of text-to-image models to prevent model misuse to create child sexual abuse material (CSAM). First, we capture the complexity of preventing CSAM generation using a game-based security definition. Second, we show that current detection methods cannot remove all children from a dataset. Third, using an ethical proxy for CSAM (a child wearing glasses), we show that even when only a small percentage of child images are left in the training dataset after filtering, there exist prompting strategies that generate a child wearing glasses using only a few more queries than when the model is trained on the unfiltered data. Fine-tuning the filtered model on child images further reduces the additional query overhead. We also show that re-introducing a concept is possible via fine-tuning even if filtering is perfect. Our results show that current child filtering methods offer limited protection to closed-weight models and no protection to open-weight models, while reducing the generality of the model by hindering the generation of child-related concepts or changing their representation. We conclude by outlining challenges in conducting evaluations that establish robust evidence on the impact of concept filtering defenses for CSAM. Comments: Extended version of the paper with the name published in the Proceedings of the 47th IEEE Symposium on Security & Privacy (IEEE S&P 2026). Please cite accordingly Subjects: Cryptography and Security (cs.CR) Cite as: arXiv:2512.05707 [cs.CR] (or arXiv:2512.05707v2 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2512.05707 Focus to learn more Submission history From: Ana-Maria Cretu [view email] [v1] Fri, 5 Dec 2025 13:34:05 UTC (8,919 KB) [v2] Thu, 23 Apr 2026 19:53:13 UTC (8,924 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev | next > new | recent | 2025-12 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes