← Back ◬ AI & Machine Learning Apr 17, 2026

TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous

arXiv AI Archived Apr 17, 2026 ✓ Full text saved

arXiv:2604.13041v1 Announce Type: cross Abstract: Table Structure Recognition (TSR) requires the logical reasoning ability of large language models (LLMs) to handle complex table layouts, but current datasets are limited in scale and quality, hindering effective use of this reasoning capacity. We thus present TableNet dataset, a new table structure recognition dataset collected and generated through multiple sources. Central to our approach is the first LLM-powered autonomous table generation an

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Databases [Submitted on 27 Feb 2026] TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous Ruilin Zhang, Kai Yang Table Structure Recognition (TSR) requires the logical reasoning ability of large language models (LLMs) to handle complex table layouts, but current datasets are limited in scale and quality, hindering effective use of this reasoning capacity. We thus present TableNet dataset, a new table structure recognition dataset collected and generated through multiple sources. Central to our approach is the first LLM-powered autonomous table generation and recognition multi-agent system that we developed. The generation part of our system integrates controllable visual, structural, and semantic parameters into the synthesis of table images. It facilitates the creation of a wide array of semantically coherent tables, adaptable to user-defined configurations along with annotations, thereby supporting large-scale and detailed dataset construction. This capability enables a comprehensive and nuanced table image annotation taxonomy, potentially advancing research in table-related domains. In contrast to traditional data collection methods, This approach facilitates the theoretically infinite, domain-agnostic, and style-flexible generation of table images, ensuring both efficiency and precision. The recognition part of our system is a diversity-based active learning paradigm that utilizes tables from multiple sources and selectively samples most informative data to finetune a model, achieving a competitive performance on TableNet test set while reducing training samples by a large margin compared with baselines, and a much higher performance on web-crawled real-world tables compared with models trained on predominant table datasets. To the best of our knowledge, this is the first work which employs active learning into the structure recognition of tables which is diverse in numbers of rows or columns, merged cells, cell contents, etc, which fits better for diversity-based active learning. Comments: The 40th Annual AAAI Conference on Artificial Intelligence Bridge Program on Logic & AI Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI) Cite as: arXiv:2604.13041 [cs.DB] (or arXiv:2604.13041v1 [cs.DB] for this version) https://doi.org/10.48550/arXiv.2604.13041 Focus to learn more Submission history From: Ruilin Zhang [view email] [v1] Fri, 27 Feb 2026 02:44:38 UTC (1,485 KB) Access Paper: HTML (experimental) view license Current browse context: cs.DB < prev | next > new | recent | 2026-04 Change to browse by: cs cs.AI References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes