When Global Gating Is Enough: Admission-Time Hubness Control in Anisotropic Vector Retrieval Systems
arXiv SecurityArchived Jun 19, 2026✓ Full text saved
arXiv:2606.19692v1 Announce Type: new Abstract: Vector hubness, where a few points become nearest neighbors of many queries, creates a poisoning risk in retrieval-augmented generation (RAG): one injected document can influence unrelated requests. Existing defenses use periodic reverse-kNN scans, leaving an exposure window and repeated corpus-wide work. We study admission-time control, scoring each candidate against sentinel queries and quarantining hub-like documents before insertion. Across two
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 18 Jun 2026]
When Global Gating Is Enough: Admission-Time Hubness Control in Anisotropic Vector Retrieval Systems
Prashant Kumar Pathak, Tarun Kumar Sharma
Vector hubness, where a few points become nearest neighbors of many queries, creates a poisoning risk in retrieval-augmented generation (RAG): one injected document can influence unrelated requests. Existing defenses use periodic reverse-kNN scans, leaving an exposure window and repeated corpus-wide work. We study admission-time control, scoring each candidate against sentinel queries and quarantining hub-like documents before insertion. Across two 100,000-document corpora, five encoders, and disjoint attacker and defender query sets, a global gate achieves recall 1.0 at the decisive embedding-space point (>=0.92 across the effective range) and 0.91 +/- 0.07 on HotFlip attacks, with 1% false positives on general documents. A per-topic gate provides no reliable benefit, consistent with anisotropy coupling local and global visibility. Thresholds are maintained incrementally, with corpus-size-independent insertion cost and amortized deletion cost. On HNSW, admission adds about 3.1% to ingestion latency, scoring remains flat to 10^6 vectors, and 1.2% of decisions flip under approximate indexing, none involving attacks. Provenance complements the gate for natural or tight-domain hubs.
Subjects: Cryptography and Security (cs.CR); Databases (cs.DB); Information Retrieval (cs.IR)
Cite as: arXiv:2606.19692 [cs.CR]
(or arXiv:2606.19692v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2606.19692
Focus to learn more
Submission history
From: Prashant Kumar Pathak [view email]
[v1] Thu, 18 Jun 2026 01:40:36 UTC (520 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-06
Change to browse by:
cs
cs.DB
cs.IR
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)