Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models
arXiv SecurityArchived Apr 27, 2026✓ Full text saved
arXiv:2512.05707v2 Announce Type: replace Abstract: We evaluate the effectiveness of filtering child images from training datasets of text-to-image models to prevent model misuse to create child sexual abuse material (CSAM). First, we capture the complexity of preventing CSAM generation using a game-based security definition. Second, we show that current detection methods cannot remove all children from a dataset. Third, using an ethical proxy for CSAM (a child wearing glasses), we show that eve
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 5 Dec 2025 (v1), last revised 23 Apr 2026 (this version, v2)]
Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models
Ana-Maria Cretu, Klim Kireev, Amro Abdalla, Wisdom Obinna, Raphael Meier, Sarah Adel Bargal, Elissa M. Redmiles, Carmela Troncoso
We evaluate the effectiveness of filtering child images from training datasets of text-to-image models to prevent model misuse to create child sexual abuse material (CSAM). First, we capture the complexity of preventing CSAM generation using a game-based security definition. Second, we show that current detection methods cannot remove all children from a dataset. Third, using an ethical proxy for CSAM (a child wearing glasses), we show that even when only a small percentage of child images are left in the training dataset after filtering, there exist prompting strategies that generate a child wearing glasses using only a few more queries than when the model is trained on the unfiltered data. Fine-tuning the filtered model on child images further reduces the additional query overhead. We also show that re-introducing a concept is possible via fine-tuning even if filtering is perfect. Our results show that current child filtering methods offer limited protection to closed-weight models and no protection to open-weight models, while reducing the generality of the model by hindering the generation of child-related concepts or changing their representation. We conclude by outlining challenges in conducting evaluations that establish robust evidence on the impact of concept filtering defenses for CSAM.
Comments: Extended version of the paper with the name published in the Proceedings of the 47th IEEE Symposium on Security & Privacy (IEEE S&P 2026). Please cite accordingly
Subjects: Cryptography and Security (cs.CR)
Cite as: arXiv:2512.05707 [cs.CR]
(or arXiv:2512.05707v2 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2512.05707
Focus to learn more
Submission history
From: Ana-Maria Cretu [view email]
[v1] Fri, 5 Dec 2025 13:34:05 UTC (8,919 KB)
[v2] Thu, 23 Apr 2026 19:53:13 UTC (8,924 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2025-12
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)