← Back ◬ AI & Machine Learning May 22, 2026

$ECUAS_n$: A family of metrics for principled evaluation of uncertainty-augmented systems

arXiv AI Archived May 22, 2026 ✓ Full text saved

arXiv:2605.20490v2 Announce Type: new Abstract: In high-stakes automated decision-making, access to predictive uncertainty is essential for enabling users -- human or downstream systems -- to accept or reject predictions based on application-specific cost trade-offs. Such uncertainty-augmented (UA) systems -- i.e., systems that output both predictions and uncertainty scores -- are currently being assessed in the literature in a variety of ways, using separate metrics to evaluate the predictions

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Artificial Intelligence [Submitted on 19 May 2026 (v1), last revised 21 May 2026 (this version, v2)] ECUAS_n: A family of metrics for principled evaluation of uncertainty-augmented systems Lautaro Estienne, Erik Ernst, Matías Vera, Pablo Piantanida, Luciana Ferrer In high-stakes automated decision-making, access to predictive uncertainty is essential for enabling users -- human or downstream systems -- to accept or reject predictions based on application-specific cost trade-offs. Such uncertainty-augmented (UA) systems -- i.e., systems that output both predictions and uncertainty scores -- are currently being assessed in the literature in a variety of ways, using separate metrics to evaluate the predictions and the uncertainty scores, setting a cost function with a fixed rejection cost or integrating over a coverage-risk curve. We argue that these evaluation approaches are inadequate for assessing overall performance of the UA system for decision making under uncertainty and propose a novel family of metrics, ECUAS_n, formulated as proper scoring rules for the task of interest. The parameter n controls the trade-off between the cost of incorrect predictions and imperfect uncertainties depending on the needs of the use-case. We demonstrate the advantages of the ECUAS_n metrics both theoretically and empirically, through experiments on diverse classification and generation datasets, including a manually annotated subset of TriviaQA. Comments: pre-print, 9-pages paper, 25 pages total Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2605.20490 [cs.AI] (or arXiv:2605.20490v2 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2605.20490 Focus to learn more Submission history From: Erik Ernst [view email] [v1] Tue, 19 May 2026 20:55:41 UTC (181 KB) [v2] Thu, 21 May 2026 03:50:50 UTC (181 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev | next > new | recent | 2026-05 Change to browse by: cs cs.LG References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes