← Back ◬ AI & Machine Learning Apr 21, 2026

RankGuide: Tensor-Rank-Guided Routing and Steering for Efficient Reasoning

arXiv AI Archived Apr 21, 2026 ✓ Full text saved

arXiv:2604.16694v1 Announce Type: new Abstract: Large reasoning models (LRMs) enhance problem-solving capabilities by generating explicit multi-step chains of thought (CoT) reasoning; however, they incur substantial inference latency and computational overhead. To mitigate this issue, recent works have explored model collaboration paradigms, where small reasoning models (SRMs) generate intermediate reasoning steps to achieve a better accuracy--latency trade-off. Despite recent progress, effectiv

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Artificial Intelligence [Submitted on 17 Apr 2026] RankGuide: Tensor-Rank-Guided Routing and Steering for Efficient Reasoning Jiayi Tian, Yupeng Su, Ryan Solgi, Souvik Kundu, Zheng Zhang Large reasoning models (LRMs) enhance problem-solving capabilities by generating explicit multi-step chains of thought (CoT) reasoning; however, they incur substantial inference latency and computational overhead. To mitigate this issue, recent works have explored model collaboration paradigms, where small reasoning models (SRMs) generate intermediate reasoning steps to achieve a better accuracy--latency trade-off. Despite recent progress, effectively and efficiently detecting and mitigating SRM failures in collaborative systems remains a key challenge. To address this issue, we analyze SRM inference in both the generated text and hidden-state spaces, and identify three types of failure modes: \textit{overconfidence}, \textit{uncertainty}, and \textit{heavy revalidation}. Building on these insights, we propose \textbf{RankGuide}, a framework that improves the efficiency and effectiveness of SRM--LRM collaboration through tensor-rank-guided routing and steering. Specifically, RankGuide leverages a routing signal that incorporates tensor-rank signals derived from consecutive hidden states to detect when SRMs are likely to fail and selectively invoke LRMs. In addition, we introduce a tensor-rank-filtered steering vector extraction method to modulate the reasoning trajectory of SRMs, thereby improving their generation quality. By improving both routing and steering through tensor-rank signals, RankGuide enables SRM--LRM collaborative systems to achieve more efficient reasoning with fewer steps and improved accuracy. Experiments on multiple reasoning benchmarks demonstrate the efficacy of RankGuide in reducing latency by up to 1.75\times compared to LRM, while maintaining competitive accuracy relative to prior methods. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2604.16694 [cs.AI] (or arXiv:2604.16694v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2604.16694 Focus to learn more Submission history From: Jiayi Tian [view email] [v1] Fri, 17 Apr 2026 20:51:04 UTC (194 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev | next > new | recent | 2026-04 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes