← Back ◬ AI & Machine Learning Jun 08, 2026

On the Shoulders of Giants: Empowering Automated Smart Contract Auditing via the GiAnt Corpus

arXiv Security Archived Jun 08, 2026 ✓ Full text saved

arXiv:2606.07363v1 Announce Type: new Abstract: High-quality smart contract auditing datasets are crucial for evaluating security tools and advancing smart contract security research. Two major limitations of existing datasets are the manual-induced scalability bottleneck and the deficiency in data granularity and diversity. To address these limitations, we propose GiANT, an automated framework designed to curate smart contract auditing datasets by distilling vulnerability insights from real-wor

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Cryptography and Security [Submitted on 5 Jun 2026] On the Shoulders of Giants: Empowering Automated Smart Contract Auditing via the GiAnt Corpus Xiaoting Zhang, Zhipeng Gao, Yiran Lv, Xing Hu, Feifei Niu, Xin Xia High-quality smart contract auditing datasets are crucial for evaluating security tools and advancing smart contract security research. Two major limitations of existing datasets are the manual-induced scalability bottleneck and the deficiency in data granularity and diversity. To address these limitations, we propose GiANT, an automated framework designed to curate smart contract auditing datasets by distilling vulnerability insights from real-world auditing reports. GiANT employs a divide-and-conquer strategy coupled with the Chain-of-Thought technique to extract structured vulnerability information from Code4rena reports, followed by an LLM-as-a-judge mechanism to perform rigorous quality assurance. To evaluate GiANT's effectiveness, we run it on 388 real-world audit reports and generate the GiAnt Corpus comprising 7,711 vulnerability findings across five severity levels. Manual assessment of the dataset demonstrates exceptional reliability in information extraction, achieving a mean quality score of 4.76\pm0.37 (out of 5) with inter-rater agreement \kappa of 0.88. We further validate the practicality of our dataset by benchmarking 4 state-of-the-art LLMs on vulnerability detection, code summarization, mitigation recommendation, and automated gas optimization tasks, to establish performance baselines, thereby providing a valuable data foundation for future research in automated smart contract auditing. Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE) Cite as: arXiv:2606.07363 [cs.CR] (or arXiv:2606.07363v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2606.07363 Focus to learn more Submission history From: Zhipeng Gao [view email] [v1] Fri, 5 Jun 2026 15:08:32 UTC (970 KB) Access Paper: view license Current browse context: cs.CR < prev | next > new | recent | 2026-06 Change to browse by: cs cs.SE References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes