CyberIntel ⬡ News
★ Saved ◆ Cyber Reads
← Back ◬ AI & Machine Learning Mar 19, 2026

Sensi: Learn One Thing at a Time -- Curriculum-Based Test-Time Learning for LLM Game Agents

arXiv AI Archived Mar 19, 2026 ✓ Full text saved

arXiv:2603.17683v1 Announce Type: new Abstract: Large language model (LLM) agents deployed in unknown environments must learn task structure at test time, but current approaches require thousands of interactions to form useful hypotheses. We present Sensi, an LLM agent architecture for the ARC-AGI-3 game-playing challenge that introduces structured test-time learning through three mechanisms: (1) a two-player architecture separating perception from action, (2) a curriculum-based learning system

Full text archived locally
✦ AI Summary · Claude Sonnet


    Computer Science > Artificial Intelligence [Submitted on 18 Mar 2026] Sensi: Learn One Thing at a Time -- Curriculum-Based Test-Time Learning for LLM Game Agents Mohsen Arjmandi Large language model (LLM) agents deployed in unknown environments must learn task structure at test time, but current approaches require thousands of interactions to form useful hypotheses. We present Sensi, an LLM agent architecture for the ARC-AGI-3 game-playing challenge that introduces structured test-time learning through three mechanisms: (1) a two-player architecture separating perception from action, (2) a curriculum-based learning system managed by an external state machine, and (3) a database-as-control-plane that makes the agents context window programmatically steerable. We further introduce an LLM-as-judge component with dynamically generated evaluation rubrics to determine when the agent has learned enough about one topic to advance to the next. We report results across two iterations: Sensi v1 solves 2 game levels using the two-player architecture alone, while Sensi v2 adds curriculum learning and solves 0 levels - but completes its entire learning curriculum in approximately 32 action attempts, achieving 50-94x greater sample efficiency than comparable systems that require 1600-3000 attempts. We precisely diagnose the failure mode as a self-consistent hallucination cascade originating in the perception layer, demonstrating that the architectural bottleneck has shifted from learning efficiency to perceptual grounding - a more tractable problem. Comments: Preprint. 18 pages, 5 figures, 2 tables. Independent research. Code and Colab demo coming soon on GitHub Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2603.17683 [cs.AI]   (or arXiv:2603.17683v1 [cs.AI] for this version)   https://doi.org/10.48550/arXiv.2603.17683 Focus to learn more Submission history From: Mohsen Arjmandi [view email] [v1] Wed, 18 Mar 2026 12:59:26 UTC (23 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev   |   next > new | recent | 2026-03 Change to browse by: cs cs.LG References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
    💬 Team Notes
    Article Info
    Source
    arXiv AI
    Category
    ◬ AI & Machine Learning
    Published
    Mar 19, 2026
    Archived
    Mar 19, 2026
    Full Text
    ✓ Saved locally
    Open Original ↗