arXiv:2603.12278v1 Announce Type: cross Abstract: Diabetic foot ulcers (DFUs) are a severe complication of diabetes, often resulting in significant morbidity. This paper presents a predictive analytic…
cyberintel.kalymoon.com · 2926 articles · updated every 4 hours · grows forever
arXiv:2603.12278v1 Announce Type: cross Abstract: Diabetic foot ulcers (DFUs) are a severe complication of diabetes, often resulting in significant morbidity. This paper presents a predictive analytic…
arXiv:2603.12286v1 Announce Type: cross Abstract: Modern neuroscience has accumulated extensive evidence on perception, memory, prediction, valuation, and consciousness, yet still lacks an explicit op…
arXiv:2603.12288v1 Announce Type: cross Abstract: Tabular machine learning presents a paradox: modern models achieve state-of-the-art performance using high-dimensional (high-D), collinear, error-pron…
arXiv:2603.12290v1 Announce Type: cross Abstract: Scholarly web is a vast network of knowledge connected by citations. However, this system is increasingly compromised by miscitation, where references…
arXiv:2603.12296v1 Announce Type: cross Abstract: Deep learning has achieved transformative performance across diverse domains, largely driven by the large-scale, high-quality training data. In contra…
arXiv:2603.12298v1 Announce Type: cross Abstract: Activation engineering enables precise control over Large Language Models (LLMs) without the computational cost of fine-tuning. However, existing meth…
arXiv:2603.12304v1 Announce Type: cross Abstract: This paper introduces a novel optimization framework that fundamentally integrates the Minimum Description Length (MDL) principle into the training dy…
arXiv:2603.12305v1 Announce Type: cross Abstract: The ability to understand and reason about cause and effect -- encompassing interventions, counterfactuals, and underlying mechanisms -- is a cornerst…
arXiv:2603.12310v1 Announce Type: cross Abstract: Despite rapid advancements in video generation models, aligning their outputs with complex user intent remains challenging. Existing test-time optimiz…
DataKrypto’s 2026 Predictions and Trends Cybersecurity Insiders
cross-posted from my blog TLDR The map of how the activations of one ‘source’ layer in an LLM impact the activations in some later ‘target’ layer can provide vectors for steering LLM behavior. Computi…
This post is an attempt to better operationalize FDT (functional decision theory). It answers the following questions: given a logical causal graph, how do we define the logical do-operator? what is l…
This work was conducted during the MATS 9.0 program under Neel Nanda and Senthooran Rajamanoharan. There's been a lot of buzz around Claude's 30K word constitution ("soul doc"), and unusual ways Anthr…
I was inspired to revise my formulation of this thought experiment by Ihor Kendiukhov's post On The Independence Axiom . Kendiukhov quotes Scott Garrabrant : My take is that the concept of expected ut…
Writing up a probably-obvious point that I want to refer to later, with significant writing LLM writing help. TL;DR: 1) A common critique of AI safety evaluations is that they occur in unrealistic set…
A central AI safety concern is that AIs will develop unintended preferences and undermine human control to achieve them. But some unintended preferences are cheap to satisfy, and failing to satisfy th…
TL;DR: We introduce a testbed based on censored Chinese LLMs, which serve as natural objects of study for studying secret elicitation techniques. Then we study the efficacy of honesty elicitation and …
The context is MIRI's twist on Axelrod's Prisoner's Dilemma tournament . Axelrod's competitors were programs, facing each other in an iterated Prisoner's Dilemma. MIRI's tournament is a one-shot Priso…
I originally wrote this as a private doc for people working in the field - it's not super polished, or optimized for a broad audience. But I'm publishing anyway because inference-verification is a new…
Authors: Gerson Kroiz*, Aditya Singh*, Senthooran Rajamanoharan, Neel Nanda Gerson and Aditya are co-first authors. This work was conducted during MATS 9.0 and was advised by Senthooran Rajamanoharan …
arXiv:2603.13290v1 Announce Type: new Abstract: Decentralized financial platforms rely heavily on Web of Trust reputation systems to mitigate counterparty risk in the absence of centralized identity v…
arXiv:2603.13420v1 Announce Type: new Abstract: Suffix jailbreak attacks serve as a systematic method for red-teaming Large Language Models (LLMs) but suffer from prohibitive computational costs, as a…
arXiv:2603.13424v1 Announce Type: new Abstract: Prompt injection remains one of the most practical attack vectors against LLM-integrated applications. We replicate the Microsoft LLMail-Inject benchmar…
arXiv:2603.13444v1 Announce Type: new Abstract: We present a technical case study on the Privacy-Enhancing Technologies (PETs) for Public Health Challenge, a collaborative effort to safely leverage se…