← Back ◬ AI & Machine Learning Apr 24, 2026

Can MLLMs "Read" What is Missing?

arXiv AI Archived Apr 24, 2026 ✓ Full text saved

arXiv:2604.21277v1 Announce Type: new Abstract: We introduce MMTR-Bench, a benchmark designed to evaluate the intrinsic ability of Multimodal Large Language Models (MLLMs) to reconstruct masked text directly from visual context. Unlike conventional question-answering tasks, MMTR-Bench eliminates explicit prompts, requiring models to recover masked text from single- or multi-page inputs across real-world domains such as documents and webpages. This design isolates the reconstruction task from ins

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Artificial Intelligence [Submitted on 23 Apr 2026] Can MLLMs "Read" What is Missing? Jindi Guo, Xi Fang, Chaozheng Huang We introduce MMTR-Bench, a benchmark designed to evaluate the intrinsic ability of Multimodal Large Language Models (MLLMs) to reconstruct masked text directly from visual context. Unlike conventional question-answering tasks, MMTR-Bench eliminates explicit prompts, requiring models to recover masked text from single- or multi-page inputs across real-world domains such as documents and webpages. This design isolates the reconstruction task from instruction-following abilities, enabling a direct assessment of a model's layout understanding, visual grounding, and knowledge integration. MMTR-Bench comprises 2,771 test samples spanning multiple languages and varying target lengths. To account for this diversity, we propose a level-aware evaluation protocol. Experiments on representative MLLMs show that the benchmark poses a significant challenge, especially for sentence- and paragraph-level reconstruction. The homepage is available at this https URL. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2604.21277 [cs.AI] (or arXiv:2604.21277v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2604.21277 Focus to learn more Submission history From: Jindi Guo [view email] [v1] Thu, 23 Apr 2026 04:44:25 UTC (43,411 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev | next > new | recent | 2026-04 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes