arXiv:2605.08549v1 Announce Type: new Abstract: Conversational AI is increasingly personalized around users' preferences, histories, goals, and knowledge, but much less around how users interpret and …
cyberintel.kalymoon.com · 2686 articles · updated every 4 hours · grows forever
arXiv:2605.08549v1 Announce Type: new Abstract: Conversational AI is increasingly personalized around users' preferences, histories, goals, and knowledge, but much less around how users interpret and …
arXiv:2605.08545v1 Announce Type: new Abstract: Agent benchmarks typically report only final outcomes: pass or fail. This threatens evaluation credibility in three ways. First, scores may be inflated …
arXiv:2605.08538v1 Announce Type: new Abstract: Current LLM agents lack principled mechanisms for managing persistent memory across long interaction horizons. We present a biologically-grounded memory…
arXiv:2605.08533v1 Announce Type: new Abstract: Clinical decision-making in emergency medicine demands rapid, accurate diagnoses under uncertainty. Despite benchmark progress, evidence for LLMs as int…
arXiv:2605.08518v1 Announce Type: new Abstract: Competition retrospectives are useful when they explain what a leaderboard measured, how hidden evaluation changed conclusions, and which design pattern…
arXiv:2605.08516v1 Announce Type: new Abstract: Transparent decision-making is essential for traffic signal control (TSC) systems to earn public trust. However, traditional reinforcement learning-base…
arXiv:2605.08496v1 Announce Type: new Abstract: Current adversarial robustness methods for large language models require extensive datasets of harmful prompts (thousands to hundreds of thousands of ex…
arXiv:2605.08480v1 Announce Type: new Abstract: Individuals with Alzheimer's disease (AD) and Alzheimer's disease-related dementia (ADRD) experience memory and thinking changes that impact their abili…
arXiv:2605.08472v1 Announce Type: new Abstract: The effectiveness of Reinforcement Learning (RL) in Large Language Models (LLMs) depends on the nature and diversity of the data used before and during …
arXiv:2605.08463v1 Announce Type: new Abstract: Autonomous AI agents are increasingly deployed in open social environments, yet the relationship between their configuration specifications and their em…
arXiv:2605.08448v1 Announce Type: new Abstract: Semi-supervised learning approaches have been investigated as a means to enhance the analysis of social media data in disaster management contexts. In t…
arXiv:2605.08445v1 Announce Type: new Abstract: AI models are increasingly deployed in live clinical environments where they must perform reliably across complex, high-stakes workflows that standard t…
arXiv:2605.08427v1 Announce Type: new Abstract: Self-play red team is an established approach to improving AI safety in which different instances of the same model play attacker and defender roles in …
arXiv:2605.08416v1 Announce Type: new Abstract: Jurisprudence, the study of how judges should properly decide cases, and alignment, the science of getting AI models to conform to human values, share a…
arXiv:2605.08415v1 Announce Type: new Abstract: Since the advent of Large Language Models (LLMs), a significant area of research has focused on their intrinsic biases, particularly in political discou…
arXiv:2605.08409v1 Announce Type: new Abstract: Conversational AI has a fundamental flaw as a knowledge interface: sycophantic chatbots induce epistemic entrenchment and delusional belief spirals even…
arXiv:2605.08405v1 Announce Type: new Abstract: How do LLMs learn in-context? Is it by pattern-matching recent tokens, or by inferring latent structure? We probe this question using a toy graph random…
arXiv:2605.08399v1 Announce Type: new Abstract: Tool-augmented language models can extend small language models with external executable skills, but scaling the tool library creates a coupled challeng…
arXiv:2605.08388v1 Announce Type: new Abstract: Human-AI teams play a pivotal role in improving overall system performance when neither the human nor the model can achieve such performance on their ow…
arXiv:2605.08386v1 Announce Type: new Abstract: Skill libraries have become a practical way for LLM agents to reuse procedural experience across tasks. However, existing systems typically treat skills…
arXiv:2605.08374v1 Announce Type: new Abstract: Episodic memory allows LLM agents to accumulate and retrieve experience, but current methods treat each memory independently, i.e., evaluating retrieval…
arXiv:2605.08368v1 Announce Type: new Abstract: Debates about large language model post-training often treat supervised fine-tuning (SFT) as imitation and reinforcement learning (RL) as discovery. But…
arXiv:2605.08360v1 Announce Type: new Abstract: Modern AI is opening the door to collective decision-making in which participants express their views as free-form text rather than voting on a fixed se…
arXiv:2605.08354v1 Announce Type: new Abstract: Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human…