The Security Budget of Code LLMs: An Information-Theoretic Capacity-Security Bound
arXiv SecurityArchived Jun 03, 2026✓ Full text saved
arXiv:2606.03308v1 Announce Type: new Abstract: AI programming assistants make natural-language prompts a software-development interface, so small prompt perturbations become usability and security risks. We study an information-theoretic trade-off for code LLMs between functional capacity, $\Cap=\rmI(c^*;c_\pi)$, and perturbation retention, $\Sec=\rmI(c_\pi;\tilde c_\pi)$. Here $\Sec$ is a retention-channel quantity, not a direct measure of exploit success or vulnerable-code generation. For cod
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 2 Jun 2026]
The Security Budget of Code LLMs: An Information-Theoretic Capacity-Security Bound
Jianwei Tai
AI programming assistants make natural-language prompts a software-development interface, so small prompt perturbations become usability and security risks. We study an information-theoretic trade-off for code LLMs between functional capacity, \Cap=\rmI(c^*;c_\pi)$\Cap=\rmI(c^*;c_\pi)$, and perturbation retention, \Sec=\rmI(c_\pi;\tilde c_\pi)$\Sec=\rmI(c_\pi;\tilde c_\pi)$. Here \Sec$\Sec$ is a retention-channel quantity, not a direct measure of exploit success or vulnerable-code generation. For code completion modeled as p\to c_\pi with perturbed prompt \tilde p, we prove \Cap+\Sec\le \rmH(c^*)+\rmI(p;\tilde p), decomposing the budget into task entropy and prompt leakage. A deterministic-embedding corollary gives the hidden-state version, and a tokenizer/gzip companion bound gives a model-agnostic ceiling on sequence-level task entropy. Empirically, we estimate embedded \Cap and \Sec from output-only last-token hidden states, excluding prompt context from the \Sec channel. Six individual validation rows across two models, two datasets, INT4/BF16 precision, and estimator ablations satisfy the embedded check (\Cap+\max_T\Sec)/(\rmH(z^*)+\max_T\rmI(p;\tilde p))\le1. Saturation is 0.27--0.92 and theorem slack is 2.36--26.94 nats; a separate three-seed stability diagnostic has mean saturation 0.87. A context-mixed cosine, used only as a per-problem generation-prompt alignment signal, correlates with pass@1 on CodeLlama-HumanEval (\rho{=}0.36, p{<}10^{-4}), Qwen-HumanEval (\rho{=}0.22, p{=}0.005), and CodeLlama-MBPP (\rho{=}0.225, p{=}0.0038; all n{=}164). Adaptive stress tests with a 23-perturbation pool, a fixed universal suffix, and prompt-embedding PGD all leave positive slack.
Subjects: Cryptography and Security (cs.CR)
Cite as: arXiv:2606.03308 [cs.CR]
(or arXiv:2606.03308v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2606.03308
Focus to learn more
Submission history
From: Jianwei Tai [view email]
[v1] Tue, 2 Jun 2026 08:22:14 UTC (41 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-06
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)