← Back ◬ AI & Machine Learning Mar 25, 2026

AgriPestDatabase-v1.0: A Structured Insect Dataset for Training Agricultural Large Language Model

arXiv AI Archived Mar 25, 2026 ✓ Full text saved

arXiv:2603.22777v1 Announce Type: new Abstract: Agricultural pest management increasingly relies on timely and accurate access to expert knowledge, yet high quality labeled data and continuous expert support remain limited, particularly for farmers operating in rural regions with unstable/no internet connectivity. At the same time, the rapid growth of AI and LLMs has created new opportunities to deliver practical decision support tools directly to end users in agriculture through compact and dep

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Artificial Intelligence [Submitted on 24 Mar 2026] AgriPestDatabase-v1.0: A Structured Insect Dataset for Training Agricultural Large Language Model Yagizhan Bilal Durak, Ahsan Ul Islam, Shahidul Islam, Ashley Morgan-Olvera, Iftekhar Ibne Basith, Syed Hasib Akhter Faruqui Agricultural pest management increasingly relies on timely and accurate access to expert knowledge, yet high quality labeled data and continuous expert support remain limited, particularly for farmers operating in rural regions with unstable/no internet connectivity. At the same time, the rapid growth of AI and LLMs has created new opportunities to deliver practical decision support tools directly to end users in agriculture through compact and deployable systems. This work addresses (i) generating a structured insect information dataset, and (ii) adapting a lightweight LLM model (\leq 7B) by fine tuning it for edge device uses in agricultural pest management. The textual data collection was done by reviewing and collecting information from available pest databases and published manuscripts on nine selected pest species. These structured reports were then reviewed and validated by a domain expert. From these reports, we constructed Q/A pairs to support model training and evaluation. A LoRA-based fine-tuning approach was applied to multiple lightweight LLMs and evaluated. Initial evaluation shows that Mistral 7B achieves an 88.9\% pass rate on the domain-specific Q/A task, substantially outperforming Qwen 2.5 7B (63.9\%), and LLaMA 3.1 8B (58.7\%). Notably, Mistral demonstrates higher semantic alignment (embedding similarity: 0.865) despite lower lexical overlap (BLEU: 0.097), indicating that semantic understanding and robust reasoning are more predictive of task success than surface-level conformity in specialized domains. By combining expert organized data, well-structured Q/A pairs, semantic quality control, and efficient model adaptation, this work contributes towards providing support for farmer facing agricultural decision support tools and demonstrates the feasibility of deploying compact, high-performing language models for practical field-level pest management guidance. Comments: Accepted in Artificial Super Intelligence Conference 2026 (Sponsored by KSU PLOT & IEEE CIS) Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2603.22777 [cs.AI] (or arXiv:2603.22777v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2603.22777 Focus to learn more Submission history From: Syed Hasib Akhter Faruqui [view email] [v1] Tue, 24 Mar 2026 04:11:27 UTC (1,765 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev | next > new | recent | 2026-03 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes