CyberIntel ⬡ News
★ Saved ◆ Cyber Reads
← Back ◬ AI & Machine Learning Jun 12, 2026

Detecting Functional Memorization in Code Language Models

arXiv Security Archived Jun 12, 2026 ✓ Full text saved

arXiv:2606.12764v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to generate code at scale. Meanwhile, prior work has investigated whether training data may be recoverable from model outputs, by auditing the textual overlap between training examples and model generations. Code, however, can be functionally equivalent while textually dissimilar. In this work, we study functional memorization: extraction of functional logic beyond what verbatim metrics detect. W

Full text archived locally
✦ AI Summary · Claude Sonnet


    Computer Science > Machine Learning [Submitted on 11 Jun 2026] Detecting Functional Memorization in Code Language Models Matthieu Meeus, Anil Ramakrishna, Matthew Grange, Zheng Xu, Luca Melis Large language models (LLMs) are increasingly used to generate code at scale. Meanwhile, prior work has investigated whether training data may be recoverable from model outputs, by auditing the textual overlap between training examples and model generations. Code, however, can be functionally equivalent while textually dissimilar. In this work, we study functional memorization: extraction of functional logic beyond what verbatim metrics detect. We construct a counterfactual setup for Olmo-3-32B, comparing a midtrained model (exposed to target code) against a pretrained reference (not exposed). We prompt both models with Python function signatures and measure both textual and functional similarity (i.e., LLM-as-a-judge, execution-based). Our results show clear evidence of functional memorization, highlighting the need for auditing metrics that go beyond textual overlap. Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR) Cite as: arXiv:2606.12764 [cs.LG]   (or arXiv:2606.12764v1 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2606.12764 Focus to learn more Submission history From: Matthieu Meeus [view email] [v1] Thu, 11 Jun 2026 00:03:25 UTC (1,914 KB) Access Paper: HTML (experimental) view license Current browse context: cs.LG < prev   |   next > new | recent | 2026-06 Change to browse by: cs cs.CL cs.CR References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
    💬 Team Notes
    Article Info
    Source
    arXiv Security
    Category
    ◬ AI & Machine Learning
    Published
    Jun 12, 2026
    Archived
    Jun 12, 2026
    Full Text
    ✓ Saved locally
    Open Original ↗