← Back ◬ AI & Machine Learning Apr 21, 2026

Why Training-Free Token Reduction Collapses: The Inherent Instability of Pairwise Scoring Signals

arXiv AI Archived Apr 21, 2026 ✓ Full text saved

arXiv:2604.16745v1 Announce Type: new Abstract: Training-free token reduction methods for Vision Transformers (ToMe, ToFu, PiToMe, and MCTF) employ different scoring mechanisms, yet they share a closely matched cliff-like collapse at high compression. This paper explains \emph{why}. We develop a diagnostic framework with two tools, ranking consistency $\rho_s$ and off-diagonal correlation $\rho_\text{off}$, that decomposes the collapse into (1)a signal-agnostic error amplifier inherent to layer-

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Artificial Intelligence [Submitted on 17 Apr 2026] Why Training-Free Token Reduction Collapses: The Inherent Instability of Pairwise Scoring Signals Yang Shanglin Training-free token reduction methods for Vision Transformers (ToMe, ToFu, PiToMe, and MCTF) employ different scoring mechanisms, yet they share a closely matched cliff-like collapse at high compression. This paper explains \emph{why}. We develop a diagnostic framework with two tools, ranking consistency \rho_s and off-diagonal correlation \rho_\text{off}, that decomposes the collapse into (1)a signal-agnostic error amplifier inherent to layer-wise reduction, predicting convex Pareto curves and r_{\text{crit}} \propto 1/L; and (2)shared reliance on \emph{pairwise} similarity signals whose ranking consistency degrades from \rho_s{=}0.88 to 0.27 in deep layers. Pairwise rankings are inherently unstable (O(N_p^2) joint perturbations) while unary signals enjoy greater stability (O(N_p) perturbations, CLT). From three design principles derived from this diagnosis, we construct CATIS as a constructive validation: unary signals raise the trigger threshold, triage suppresses the gain. On ViT-Large at 63% FLOPs reduction, CATIS retains 96.9% of vanilla accuracy (81.0%) on ImageNet-1K where all baselines collapse to 43--65%. Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2604.16745 [cs.AI] (or arXiv:2604.16745v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2604.16745 Focus to learn more Submission history From: Shanglin Yang [view email] [v1] Fri, 17 Apr 2026 23:26:27 UTC (10,369 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev | next > new | recent | 2026-04 Change to browse by: cs cs.CV References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes