cyberintel.kalymoon.com · 2828 articles · updated every 4 hours · grows forever
arXiv:2603.29557v1 Announce Type: new Abstract: Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-…
arXiv:2603.29500v1 Announce Type: new Abstract: Large language models (LLMs) have recently demonstrated impressive performance on complex, multi-step reasoning tasks, especially when post-trained with…
arXiv:2603.29496v1 Announce Type: new Abstract: We present Metriplector, a neural architecture primitive in which the input configures an abstract physical system--fields, sources, and operators--and …
arXiv:2603.29491v1 Announce Type: new Abstract: In the evaluation of attribution quality, the quantitative assessment of explanation legibility is particularly difficult, as it is influenced by varyin…
arXiv:2603.29399v1 Announce Type: new Abstract: Constructing Extract-Load-Transform (ELT) pipelines is a labor-intensive data engineering task and a high-impact target for AI automation. On ELT-Bench,…
arXiv:2603.29366v1 Announce Type: new Abstract: Prior authorization remains one of the most burdensome administrative processes in U.S. healthcare, consuming billions of dollars and thousands of physi…
arXiv:2603.29361v1 Announce Type: new Abstract: Tree ensembles (TEs) find a multitude of practical applications. They represent one of the most general and accurate classes of machine learning methods…
arXiv:2603.29357v1 Announce Type: new Abstract: AI evaluation suites often report many scores without checking whether those scores carry independent information. We introduce Effective Dimensionality…
arXiv:2603.29353v1 Announce Type: new Abstract: We introduce Nomad, a system for autonomous data exploration and insight discovery. Given a corpus of documents, databases, or other data sources, users…
arXiv:2603.29318v1 Announce Type: new Abstract: Smartphone GUI agents execute tasks by operating directly on app interfaces, offering a path to broad capability without deep system integration. Howeve…
arXiv:2603.29262v1 Announce Type: new Abstract: Grokking in modular arithmetic has established itself as the quintessential fruit fly experiment, serving as a critical domain for investigating the mec…
arXiv:2603.29231v1 Announce Type: new Abstract: Existing benchmarks measure capability -- whether a model succeeds on a single attempt -- but production deployments require reliability -- consistent s…
arXiv:2603.29211v1 Announce Type: new Abstract: In recent years, multimodal large models have continued to improve on general benchmarks. However, in real-world content moderation and adversarial sett…
arXiv:2603.29206v1 Announce Type: new Abstract: Routing is widely used to scale large language models, from Mixture-of-Experts gating to multi-model/tool selection. A common belief is that routing to …
arXiv:2603.29199v1 Announce Type: new Abstract: The AEC-Bench is a multimodal benchmark for evaluating agentic systems on real-world tasks in the Architecture, Engineering, and Construction (AEC) doma…
arXiv:2603.29161v1 Announce Type: new Abstract: Modern web scraping struggles with dynamic, interactive websites that require more than static HTML parsing. Current methods are often brittle and requi…
arXiv:2603.29152v1 Announce Type: new Abstract: Metal-organic frameworks (MOFs) offer a vast design space, and as such, computational simulations play a critical role in predicting their structural an…
arXiv:2603.29149v1 Announce Type: new Abstract: Access to the most up-to-date information on medical countermeasures is important for the research and development of effective treatments for viruses a…
arXiv:2603.29142v1 Announce Type: new Abstract: Formative feedback is central to effective learning, yet providing timely, individualised feedback at scale remains a persistent challenge. While recent…
arXiv:2603.29139v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) have enabled agentic systems that translate natural language intent into executable scientific visualiza…
arXiv:2603.29112v1 Announce Type: new Abstract: We introduce GISTBench, a benchmark for evaluating Large Language Models' (LLMs) ability to understand users from their interaction histories in recomme…
arXiv:2603.29085v1 Announce Type: new Abstract: Large language models (LLMs) remain brittle on multi-hop question answering (MHQA), where answering requires combining evidence across documents through…
arXiv:2603.29075v1 Announce Type: new Abstract: The way we're thinking about generative AI right now is fundamentally individual. We see this not just in how users interact with models but also in how…