arXiv:2605.08385v1 Announce Type: new Abstract: While contemporary deep learning malware detectors define a dominant defense paradigm, their sophistication also exposes them to novel structural evasio…
cyberintel.kalymoon.com · 2686 articles · updated every 4 hours · grows forever
arXiv:2605.08385v1 Announce Type: new Abstract: While contemporary deep learning malware detectors define a dominant defense paradigm, their sophistication also exposes them to novel structural evasio…
arXiv:2605.08382v1 Announce Type: new Abstract: LLM coding agents now generate code at an unprecedented scale, yet LLM-generated code introduces cybersecurity vulnerabilities into codebases without hu…
arXiv:2605.08363v1 Announce Type: new Abstract: Kettle is an attested build system that produces cryptographically verifiable provenance for software built inside Trusted Execution Environments (TEEs)…
arXiv:2605.08316v1 Announce Type: new Abstract: Security alert screening is the downstream task of filtering, prioritizing, correlating, and contextualizing alerts for analyst attention in Security Op…
arXiv:2605.08313v1 Announce Type: new Abstract: Large language models (LLMs) rely on deterministic pseudorandom number generators (PRNGs) for autoregressive sampling, creating a critical supply-chain …
arXiv:2605.08310v1 Announce Type: new Abstract: Browser agents are increasingly deployed in long-horizon tasks, which require executing extended action chains to accomplish user goals. However, this p…
arXiv:2605.08277v1 Announce Type: new Abstract: Many-shot jailbreaking (MSJ) causes safety-aligned language models to answer harmful queries by preceding them with many harmful question-answer demonst…
arXiv:2605.08257v1 Announce Type: new Abstract: Motivated by the challenge to improve the adversarial robustness, security, and trust of medical decision making intelligent agents, this study develops…
arXiv:2605.07103v1 Announce Type: new Abstract: Reaction feasibility prediction, as a fundamental problem in computational chemistry, has benefited from diverse tools enabled by recent advances in art…
arXiv:2605.07080v1 Announce Type: new Abstract: Many real-world resource allocation systems, such as humanitarian logistics and vaccine distribution, must preposition limited supply across multiple lo…
arXiv:2605.07073v1 Announce Type: new Abstract: Agent systems often decompose a task across multiple roles, but these roles are typically specified by prompts rather than enforced by access controls. …
arXiv:2605.07066v1 Announce Type: new Abstract: Autonomous systems that build structures from natural-language instructions need reliable spatial reasoning, yet large language models (LLMs) make syste…
arXiv:2605.07042v1 Announce Type: new Abstract: Large Language Model (LLM) agents are deployed in complex environments -- such as massive codebases, enterprise databases, and conversational histories …
arXiv:2605.07021v1 Announce Type: new Abstract: Reasoning in Large Language Models (LLMs) poses a challenge for oversight as many misaligned behaviors do not surface until reasoning concludes. To addr…
arXiv:2605.07002v1 Announce Type: new Abstract: A major bottleneck in characterizing the failure modes of generative AI systems is the cost and time of annotation and evaluation. Consequently, adaptiv…
arXiv:2605.06993v1 Announce Type: new Abstract: Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically cost…
arXiv:2605.06957v1 Announce Type: new Abstract: We present a dynamic policy-learning approach that combines generalized planning and hierarchical task decomposition for LLM-based agents. Our method, H…
arXiv:2605.06951v1 Announce Type: new Abstract: Constraint inference is widely considered essential to align reinforcement learning agents with safety boundaries and operational guidelines by observin…
arXiv:2605.06898v1 Announce Type: new Abstract: At the heart of existing language model agents is a fixed orchestrator program responsible for the state transition between consecutive turns. This pape…
arXiv:2605.06895v1 Announce Type: new Abstract: How can we make models robust to even imperfect human feedback? In reinforcement learning from human feedback (RLHF), human preferences over model outpu…
arXiv:2605.06890v1 Announce Type: new Abstract: AI agents are promising for high-stakes enterprise workflows, but dependable deployment remains limited because tool-use failures are difficult to diagn…
arXiv:2605.06882v1 Announce Type: new Abstract: Large Language Models (LLMs) have achieved great improvements in recent years. Nevertheless, it still remains unclear how good LLMs are for reasoning ta…
arXiv:2605.06869v1 Announce Type: new Abstract: AI agent research spans a wide spectrum: from RL agents that learn from scratch to foundation model agents that leverage pre-trained knowledge, yet no u…