Sensitivity Uncertainty Alignment in Large Language Models
arXiv SecurityArchived Apr 24, 2026✓ Full text saved
arXiv:2604.20903v1 Announce Type: new Abstract: We propose Sensitivity-Uncertainty Alignment (SUA), a framework for analyzing failures of large language models under adversarial and ambiguous inputs. We argue that adversarial sensitivity and ambiguity reflect a common issue: misalignment between prediction instability and model uncertainty. A reliable model should express higher uncertainty when its predictions are unstable; failure to do so leads to miscalibration. We define a scalar score, SUA
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 21 Apr 2026]
Sensitivity Uncertainty Alignment in Large Language Models
Prakul Sunil Hiremath, Harshit R. Hiremath
We propose Sensitivity-Uncertainty Alignment (SUA), a framework for analyzing failures of large language models under adversarial and ambiguous inputs. We argue that adversarial sensitivity and ambiguity reflect a common issue: misalignment between prediction instability and model uncertainty. A reliable model should express higher uncertainty when its predictions are unstable; failure to do so leads to miscalibration.
We define a scalar score, SUA_theta(x), capturing the difference between distributional sensitivity and predictive entropy. We show that minimizing its positive part bounds worst-case perturbed risk and relates to calibration error. We also formalize ambiguity collapse, where models produce overconfident outputs despite multiple valid interpretations.
We introduce SUA-TR, a training method combining consistency regularization and entropy alignment, along with an abstention rule for safer inference. Across tasks including question answering and classification, SUA better identifies model failures than entropy or self-consistency alone.
The framework is model-agnostic and provides a basis for improving reliability in evolving language models.
Comments: 24 pages, 4 tables, 2 figures
Subjects: Cryptography and Security (cs.CR)
ACM classes: I.2.6; I.2.4; K.6.5
Cite as: arXiv:2604.20903 [cs.CR]
(or arXiv:2604.20903v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2604.20903
Focus to learn more
Submission history
From: Prakul Hiremath [view email]
[v1] Tue, 21 Apr 2026 17:53:12 UTC (144 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-04
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)