TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous
arXiv AIArchived Apr 17, 2026✓ Full text saved
arXiv:2604.13041v1 Announce Type: cross Abstract: Table Structure Recognition (TSR) requires the logical reasoning ability of large language models (LLMs) to handle complex table layouts, but current datasets are limited in scale and quality, hindering effective use of this reasoning capacity. We thus present TableNet dataset, a new table structure recognition dataset collected and generated through multiple sources. Central to our approach is the first LLM-powered autonomous table generation an
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Databases
[Submitted on 27 Feb 2026]
TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous
Ruilin Zhang, Kai Yang
Table Structure Recognition (TSR) requires the logical reasoning ability of large language models (LLMs) to handle complex table layouts, but current datasets are limited in scale and quality, hindering effective use of this reasoning capacity. We thus present TableNet dataset, a new table structure recognition dataset collected and generated through multiple sources. Central to our approach is the first LLM-powered autonomous table generation and recognition multi-agent system that we developed. The generation part of our system integrates controllable visual, structural, and semantic parameters into the synthesis of table images. It facilitates the creation of a wide array of semantically coherent tables, adaptable to user-defined configurations along with annotations, thereby supporting large-scale and detailed dataset construction. This capability enables a comprehensive and nuanced table image annotation taxonomy, potentially advancing research in table-related domains. In contrast to traditional data collection methods, This approach facilitates the theoretically infinite, domain-agnostic, and style-flexible generation of table images, ensuring both efficiency and precision. The recognition part of our system is a diversity-based active learning paradigm that utilizes tables from multiple sources and selectively samples most informative data to finetune a model, achieving a competitive performance on TableNet test set while reducing training samples by a large margin compared with baselines, and a much higher performance on web-crawled real-world tables compared with models trained on predominant table datasets. To the best of our knowledge, this is the first work which employs active learning into the structure recognition of tables which is diverse in numbers of rows or columns, merged cells, cell contents, etc, which fits better for diversity-based active learning.
Comments: The 40th Annual AAAI Conference on Artificial Intelligence Bridge Program on Logic & AI
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
Cite as: arXiv:2604.13041 [cs.DB]
(or arXiv:2604.13041v1 [cs.DB] for this version)
https://doi.org/10.48550/arXiv.2604.13041
Focus to learn more
Submission history
From: Ruilin Zhang [view email]
[v1] Fri, 27 Feb 2026 02:44:38 UTC (1,485 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.DB
< prev | next >
new | recent | 2026-04
Change to browse by:
cs
cs.AI
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)