← Back ◬ AI & Machine Learning Apr 01, 2026

Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing

arXiv Security Archived Apr 01, 2026 ✓ Full text saved

arXiv:2603.28972v1 Announce Type: new Abstract: The large-scale adoption of Large Language Models (LLMs) forces a trade-off between operational cost (OpEx) and data privacy. Current routing frameworks reduce costs but ignore prompt sensitivity, exposing users and institutions to leakage risks towards third-party cloud providers. We formalise the "Inseparability Paradigm": advanced context management intrinsically coincides with privacy management. We propose a local "Privacy Guard" -- a holistic

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Cryptography and Security [Submitted on 30 Mar 2026] Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing Alessio Langiu The large-scale adoption of Large Language Models (LLMs) forces a trade-off between operational cost (OpEx) and data privacy. Current routing frameworks reduce costs but ignore prompt sensitivity, exposing users and institutions to leakage risks towards third-party cloud providers. We formalise the "Inseparability Paradigm": advanced context management intrinsically coincides with privacy management. We propose a local "Privacy Guard" -- a holistic contextual observer powered by an on-premise Small Language Model (SLM) -- that performs abstractive summarisation and Automatic Prompt Optimisation (APO) to decompose prompts into focused sub-tasks, re-routing high-risk queries to Zero-Trust or NDA-covered models. This dual mechanism simultaneously eliminates sensitive inference vectors (Zero Leakage) and reduces cloud token payloads (OpEx Reduction). A LIFO-based context compacting mechanism further bounds working memory, limiting the emergent leakage surface. We validate the framework through a 2x2 benchmark (Lazy vs. Expert users; Personal vs. Institutional secrets) on a 1,000-sample dataset, achieving a 45% blended OpEx reduction, 100% redaction success on personal secrets, and -- via LLM-as-a-Judge evaluation -- an 85% preference rate for APO-compressed responses over raw baselines. Our results demonstrate that Token Parsimony and Zero Leakage are mathematically dual projections of the same contextual compression operator. Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.28972 [cs.CR] (or arXiv:2603.28972v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2603.28972 Focus to learn more Submission history From: Alessio Langiu [view email] [v1] Mon, 30 Mar 2026 20:16:42 UTC (1,689 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev | next > new | recent | 2026-03 Change to browse by: cs cs.AI References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes